ActiveRecord::Querying#find_by_sql has a confusing interface?

Hi,

We happened to need #find_by_sql today and noticed that it appears to have a somewhat confusing interface/method definition/documentation.

The method definition is:

def find_by_sql(sql, binds = [])

end

Which would seem to imply that you should use it like this:

Post.find_by_sql(“SELECT * FROM posts WHERE id = ?”, [1])

However, that is wrong, you actually need to do (and the examples in the documentation are like this):

Post.find_by_sql([“SELECT * FROM posts WHERE id = ?”, 1])

I spent some time reading the documentation and code and can’t figure out what the binds argument should be used for.

Maybe some sort of example or explanation should be added to the #find_by_sql documentation? I’d do so myself, but I’m having trouble figuring it out…

— Matias

You are incorrect the documentation is accurate.

You need to have a deeper understanding of ruby, and specifically the magic of the splat (*) operator.

You do not need to pass all the arguments inside of an array, the correct syntax is:

Post.find_by_sql(“SELECT * FROM posts WHERE id = ?”, 1)

‘binds’ is just what you think it is, the variables assigned to the replacements noted by the ? characters. If you had two, it would look like so:

Post.find_by_sql(“SELECT * FROM posts WHERE id = ? AND foo != ?”, 1, 2)

This works because of how the splat operator works under the hood, taking the arity past the first argument and turning it into an array on the receiving side of the called method (find_by_sql)

You should have a deeper understanding of the advanced parts of Ruby (like splats) before moving into Rails.

The documentation is standard according to how Ruby is documented.

-Jason

You do not need to pass all the arguments inside of an array, the correct syntax is:

Post.find_by_sql(“SELECT * FROM posts WHERE id = ?”, 1)

And yet that’s not what the document examples do…

Does the method doc not show the splat operator? Because in the quoted method doc, the * isn’t shown, which would lead to confusion even if that’s what being used under the hood.

Oh I see what you mean. You’re saying that the docs example show this:

Post.find_by_sql ["SELECT title FROM posts WHERE author = ? AND created > ?", author_id, start_date]
Post.find_by_sql ["SELECT body FROM comments WHERE author = :user_id OR approved_by = :user_id", { :user_id => user_id }]

But again, knowledge of Ruby tells me that the is no programatic difference between calling a method with an array and passing arguments to an array. (Unless this has changed in Ruby, I’m pretty sure ruby basically treats these as the very same thing)

I see that in other parts of the Ruby documentation, the splat is shown in the method title (which is really just an arbitrary thing for us the developers since you never actually write this)

As in here: http://api.rubyonrails.org/v2.3.8/classes/ActiveRecord/Base.html

attr_accessible(*attributes)

So in this case I would say that the correct documentation should be:

find_by_sql(sql, *binds = [])

You do not need to pass all the arguments inside of an array, the correct syntax is:
Post.find_by_sql(“SELECT * FROM posts WHERE id = ?”, 1)

I wonder if you actually tried that in console? Because I did, with this result:

[1] pry(main)> Post.find_by_sql(“SELECT * FROM posts WHERE id = ?”, 1)

NoMethodError: undefined method `empty?’ for 1:Fixnum

from /Users/matt/.rvm/gems/ruby-2.2.1/gems/activerecord-4.1.10/lib/active_record/connection_adapters/abstract_adapter.rb:389:in `without_prepared_statement?’

You should have a deeper understanding of the advanced parts of Ruby (like splats) before moving into Rails.

Having used Ruby professionally for the past five years or so, I have pretty good understanding of basic Ruby behaviour (like splats), so your snark is unwarranted (and not a good way to make people feel wanted in this community).

So in this case I would say that the correct documentation should be:
find_by_sql(sql, *binds = [])

Except that’s not what the method definition actually is. There is no splat in the definition. If there was, the behaviour would be different (if it wasn’t for the fact that def find_by_sql(sql, *binds=[]) would be invalid ruby).

@Jason, I don’t think you’re correct about the way Ruby functions. Splats are not implicit; they must be declared in the method definition. For example:

irb(main):001:0> def t(x, y=[])

irb(main):002:1> puts x

irb(main):003:1> puts y.inspect

irb(main):004:1> end

=> :t

irb(main):005:0> t(1, 2)

1

2

=> nil

irb(main):006:0> t(1,2,3)

ArgumentError: wrong number of arguments (3 for 1…2)

from (irb):1:in `t'

from (irb):6

So clearly if you call a method like that with two arguments, the second will not be coerced into an array like it would if you used splat, and further the method call fails if you pass more than 2 arguments. Hence splat is not implicit.

And the method definition for find_by_sql does not define a splat (see rails/activerecord/lib/active_record/querying.rb) so it cannot possibly function the way you claim it does. Have you tested it out?

I have tested it out, and it fails with more than two arguments to the method call as one would expect based on the method signature:

irb(main):002:0> Client.find_by_sql(‘select ?, ?’, 1, 2)

ArgumentError: wrong number of arguments (3 for 1…2)

from /Users/.../.rbenv/versions/ruby-2.1/lib/ruby/gems/2.1.0/gems/activerecord-4.2.2/lib/active_record/querying.rb:38:in `find_by_sql'

from (irb):2

You need to have a deeper understanding of ruby, and specifically the magic of the splat (*) operator.

I think you missed that the method signature is def find_by_sql(sql, binds = []), and not def find_by_sql(sql, *binds). There’s no splat in the method signature.

Before trying to belittle someone, maybe make sure that you’re actually correct :stuck_out_tongue:

Remember MINSWAN, people.

(As for the on-topic conversation, sorry, I have no idea what’s going on with that binds argument.)

Cheers,

-foca

Furthermore, the definition of find_by_sql truly is confusing in because when used the way the docs say to call it, the binds variable does not contain the bind variable. For example:

irb(main):010:0> def t(x, y=[])

irb(main):011:1> puts “x: #{x.inspect}”

irb(main):012:1> puts “y: #{y.inspect}”

irb(main):013:1> end

=> :t

irb(main):014:0> t(1,2)

x: 1

y: 2

=> nil

irb(main):016:0> t([1,2])

x: [1, 2]

y: []

The bind argument was added as bind_values all the way back in 2010, with no change to the documentation: https://github.com/rails/rails/commit/cc468d3ec81d6f1298fca91c0549584b36dafcc6

So the docs are probably just wrong — I’m sure a documentation PR would be quickly merged.

@TJ: No, I don’t believe the docs are wrong. If you try the examples they only work as documented, and fail if go by what you’d expect from the method definition instead.

Sorry, I suppose by “wrong” I meant “incomplete”.

There is a binds argument, and it seems to be used, but it is not documented.

What is the binds argument used for? It’s not immediately clear from the source, and using it as one would expect doesn’t work either.

You were right, I apologize.

There’s also no test case (as far as I can find) that would cover passing the binds args to #find_by_sql

Anyone know what it’s supposed to be used for?

If I’m following the code, it looks like the binds argument is passed through #select_all and eventually down to the adapter’s #exec_query, e.g., the MySQL adapter. So, the behavior could be different depending on the adapter and even DB version. That makes sense, because different databases handle prepared statements and bind variables differently.

And it looks like the documentation example of #find_by_sql, passing the array with “bind variables” into the sql arg, works because that sql arg is being passed to #sanitize_sql, which has its own implementation of replacing variables outside of/before the adapter. So, using this syntax isn’t really using bind variables, passed to the database. It’s more string interpolation and sanitation.

So, following the code, I was able to get a query executing with the Postgres adapter and actually passing the bind variables to Postgres.

Example 1, using binds arg:

Project.find_by_sql(‘select * from projects where client_id = $1 and region = $2’, [[nil, 123], [nil, ‘West’]])

(NOTE: You can use the column name in place of nil, but I get a warning when I do that.)

Results in this statements in the PG log:

LOG: execute a3: select * from projects where client_id = $1 and region = $2

DETAIL: parameters: $1 = ‘123’, $2 = ‘West’

Example 2, specifying the variables in the sql arg:

Project.find_by_sql([‘select * from projects where client_id = ? and region = ?’,123, ‘West’])

Results in:

LOG: execute : select * from projects where client_id = 123 and region = ‘West’

So, you can see those are clearly different. In example 1, the statement is being prepared, and the bind variables are passed to Postgres. In example 2, the variables are being replaced by Rails via #sanitize_sql before the statement is passed on to the adapter, which results in an unnamed statement passed to Postgres with no bind variables. (I think, that’s how I’m interpreting it at least.)

I couldn’t get a true bind variable example working with MySQL, but I don’t have a convenient environment using recent versions of the adapter and MySQL server. I think this is related to Pat Shaughnessy’s post on prepared statements.

Following that logic, it’s not too surprising that standard model methods result in different behavior in PG and MySQL:

Project.find(1) using the Postgres adapter results in:

LOG: execute a4: SELECT “projects”.* FROM “projects” WHERE “projects”.“id” = $1 LIMIT 1

DETAIL: parameters: $1 = ‘1’

Project.find(1) using (my older version of) the MySQL adapter results in the MySQL equivalent of:

LOG: execute : SELECT “projects”.* FROM “projects” WHERE “projects”.“id” = 1 LIMIT 1

I haven’t really kept up with database internals enough to know how much of an improvement prepared statements yield. Years ago, prepared statements could give you a significant performance improvement, because the query parser can easily see two queries are the same, reuse copies of the execution plan, etc. My understanding is databases have gotten better about deriving similarities in queries and reusing plans, cached results, etc.

I don’t have an opinion (yet) on what should be done to improve the situation, but I hope that sheds a little light on what’s happening with the binds arg and why you might want to use it.

Cheers,

Al

I believe the binds are used to improve the statement caching and not suggested to be used directly.

Here is an example of an internal use when Post.find(1) executed:

id_query_attribute = ActiveRecord::Relation::QueryAttribute.new(:id, 1, Post.type_for_attribute(:id))

Post.find_by_sql(‘select * from posts where id = ?’, [id_query_attribute])

Ah, QueryAttribute was the part I was missing, didn’t have time to dig through the code and follow how something like #find was specifying the bind variables (correctly). That makes sense.

Thanks,

Al