Request for comments: prepared statements in ActiveRecord

Jeremy Kemper wrote:

I'd start with using bound variables first, though. They're always beneficial and are a natural stepping stone to prepared statements which can be added later for those queries which may benefit (many do not).

database) bind variables. For a while now I've had a working (passes all tests) implementation of AR that uses bind variables for inserts and updates, those being by far the easiest to tweak. Given how widely spread throughout the codebase the selects are, I haven't had the time to think through a reasonable approach there.

Though I like what was suggested (by you I think) back in Feb, around the idea of SqlString < String. Has anyone looked into that further?

Are you attending RailsConf this week?

Perhaps a group of us could get together to dig into this?

I like the gist of your idea and approach. Thanks for working on it.

I'd start with using bound variables first, though. They're always beneficial and are a natural stepping stone to prepared statements which can be added later for those queries which may benefit (many do not).

Thanks. :slight_smile: Though I don't exactly understand what you mean - aren't prepared statements and bounded variables the same thing? That is, is it not so that one cannot exist without the other? My patch uses both prepared statements and bounded variables, in case that wasn't clear.

Sorry I didn't comment on your earlier postgres patch: it includes a lot of superfluous style edits and untested changes, so I didn't review further. I salute your effort but it could use a lot of cleanup before committing.

No need to apologize. And I'd like to add a minor correction: the PostgreSQL patch isn't mine, I'm merely trying to get attention on that patch, because it has a lot of fixes that I depend on.

Are you attending RailsConf this week?

I will not be attending RailsConf.

I have no experience with Oracle, but according to some blogs, such as this one:

Rails performs badly because of the lack of use of prepared statements. Here is a quote from Greg Luck: "We have two production applications running on Ruby. And how is it. Well, despite being perhaps no more than 5% of the functionality of our applications, Ruby on Rails is the number one consumer of Oracle CPU and logical gets. Why? Rails does not support prepared statements, so Oracle has to reparse every time."

And I forgot to mention this last time, but there's one more reason for using prepared statements/bounded variables: binary support in PostgreSQL. I have a Rails app which uses PostgreSQL, and its Person.password_password column was of the type bytea. It contains a raw hash of the user's password. When a user logs in, the app performed the following query: Person.find_by_username_and_password_hash(username, calculate_hash(password)) This will send a query to PostgreSQL, and this query will contain a binary string. However, in PostgreSQL, binary strings must be escaped with escape_bytea() - the regular string escaping function will not do. Because the binary string isn't escaped properly, the whole database connection breaks (it throws an exception, something about "character not allowed in Unicode mode" or something).

One may wonder why the binary string isn't escaped correctly. The reason is this: 1. It seems impossible in Ruby to determine whether a String contains binary data without scanning through the entire String (expensive), so the current PostgreSQL adapter assumes that strings are text strings, and escapes them as text strings, unless it knows that this string belongs to a binary column. 2. During ActiveRecord::Base.save, ActiveRecord will now which attributes belong to which columns, so saving records that contain binary data is no problem. The string escaper will have knowledge about which argument belongs to which column. 3. However, things go wrong during ActiveRecord::Base.find. In there, it is impossible to find out which arguments belong to which table columns.

By using prepared statements/bounded variables, we can eliminate this entire problem. Strings will not be escaped, but be directly bounded to a prepared statements.

I'm sorry, I do not understand what you mean by "flushing statements" and "allow it to grow to unreasonable levels".

Hongli Lai wrote:

I'd have to disagree with you there. Rails actually performs very well with Oracle. A change made a while back to default to "similar" cursor_sharing resulted in much of the benefit of bind variables. "performs badly" is definitely too strong.

I have no experience with Oracle, but according to some blogs, such as this one: Uncommented Bytes - Jeff Sheets: Major Enterprise Ruby on Rails Issues? Rails performs badly because of the lack of use of prepared statements. Here is a quote from Greg Luck: "We have two production applications running on Ruby. And how is it. Well, despite being perhaps no more than 5% of the functionality of our applications, Ruby on Rails is the number one consumer of Oracle CPU and logical gets. Why? Rails does not support prepared statements, so Oracle has to reparse every time."

This was before Rails 1.2 which introduced the cursor_sharing=similar session parameter. It's oracle-specific and makes the database rewrite the queries on the fly to use bind variables instead of literal values.

Btw even before 1.2 you may had used that parameter (only database-wide instead of on a single application).

In any case using real bind variable would represent a further optimization step, as it'd allow a fine tuning of which bind variables vs literal values use in each query.

Luca

The former refers to your intention to drop prepared statements after each query. The latter is what would happen if you didn’t cleanup these prepared statements - you’d suddenly have hundreds of them in the database. I think databases would blindly allow it to grow to unreasonable levels - can they do otherwise? Prepared statements were designed to be made manually; I don’t think any of us can predict what happens when you have an insane number of them active, eg. would it be a hog, would it take up memory, etc.

Luca Mearelli wrote:

Hongli Lai wrote: In any case using real bind variable would represent a further optimization step, as it'd allow a fine tuning of which bind variables vs literal values use in each query.

Agreed. In my tests the _client_ performance impact of using real bind variables instead of relying on cursor_sharing of similar was basically non-existent, but there was some (small, but still some) server improvement. Plus some Oracle query plans worked better (ie., required fewer hints) when constants were left as constants.

Jeremy Kemper wrote: > I'd start with using bound variables first, though. They're always > beneficial and are a natural stepping stone to prepared statements > which can be added later for those queries which may benefit (many do > not).

From my tests w/ Oracle, all the benefit was in the use of (real, aka database) bind variables. For a while now I've had a working (passes all tests) implementation of AR that uses bind variables for inserts and updates, those being by far the easiest to tweak. Given how widely spread throughout the codebase the selects are, I haven't had the time to think through a reasonable approach there.

Though I like what was suggested (by you I think) back in Feb, around the idea of SqlString < String. Has anyone looked into that further?

I wish :slight_smile:

> Are you attending RailsConf this week?

Perhaps a group of us could get together to dig into this?

That'd be awesome. I'll be at the hackfest at the Jupiter hotel the nights of the 17th-19th. It's at 800 E Burnside St, about a half mile from the convention center.

jeremy

Jeremy Kemper wrote:

Though I like what was suggested (by you I think) back in Feb, around the idea of SqlString < String. Has anyone looked into that further?

I wish :slight_smile:

I actually started this morning to look into it :wink:

Are you attending RailsConf this week?

Perhaps a group of us could get together to dig into this?

That'd be awesome. I'll be at the hackfest at the Jupiter hotel the nights of the 17th-19th. It's at 800 E Burnside St, about a half mile from the convention center.

aahh I won't be at the conf... any way to meet in a virtual space? perhaps the #railsconf irc channel?

Luca

I tested this with MySQL. I can't have unlimited number of prepared statements. After a certain threshold, queries will fail. My MySQLAdapter implementation caches up to 32 prepared statements (this number can be changed). If the cache is full, the least recently used one will be removed. Though, according to benchmarks, caching prepared statements doesn't improve performance at all on MySQL.

I tested this with MySQL. I can’t have unlimited number of prepared statements. After a certain threshold, queries will fail.

It would be good to know what is this treshold.

… according to benchmarks, caching prepared statements doesn’t improve performance at all on MySQL.

It is possible that there is a performance boost on raw queries, but that it’s not visible because the ActiveRecord object instantiation time is an order of magnitude higher.

Also, simple queries against a database that has a miniscule amount of records don’t benefit much from having a cached execution plan. You could first benchmark several hundred thousand INSERTs, then UPDATE those records, then query those tables with SELECTs with as much conditions and joins you could throw in. Don’t forget the database indexes which play an important role in laying out the execution plan.

It would be good to know what is this treshold.

For some reason I'm unable to reproduce that problem now. Perhaps I was mistaken.

It is possible that there is a performance boost on raw queries, but that it's not visible because the ActiveRecord object instantiation time is an order of magnitude higher.

Also, simple queries against a database that has a miniscule amount of records don't benefit much from having a cached execution plan. You could first benchmark several hundred thousand INSERTs, then UPDATE those records, then query those tables with SELECTs with as much conditions and joins you could throw in. Don't forget the database indexes which play an important role in laying out the execution plan.

I've searched Google for benchmarks on prepared-vs-unprepared statements, and to my surprise I couldn't find any. I'll try to setup a good benchmark.

Yes, I'd be delighted to. :slight_smile: I hope we can solve this together.

Thank you so much for working on this feature. Prepared statement/ bind variable support is critical for Oracle projects, especially in the (evil) enterprise world. I know that Rails isn't targeted for large enterprise apps, but that doesn't mean it isn't targeted at _enterprises_. Lots of small and medium-sized internal apps would benefit from Rails, but interoperability/ROI may dictate Oracle as the database. Enterprise DBAs may (rightly, IMHO) scoff at a framework that lacks prepared statement support.

Brian Hartin

Very interesting to see you working on this project. I work a lot for Enterprises and it makes a lot of difference when it comes to Oracle with prepared statements and data binding. In case of loops etc. we had find the performance difference to be in multiples.

I was just wondering how far this development has reached and when I would be able to use it. Planning a large rails project around Oracle soon. Good thing I know that the existing rails work and in future we would be able to get much better performance.

Deepak

Good to see someone appreciates my work. :slight_smile: MySQL support is probably done. I was still working on PostgreSQL support. There's no Oracle support though as I don't have access to an Oracle database, but modifying the Oracle adapter should be fairly trivial. The work is currently on hold as I'm busy with other things, but I intent to resume this work in the near future.

Hi Hongli,

> Very interesting to see you working on this project. I work a lot for > Enterprises and it makes a lot of difference when it comes to Oracle > with prepared statements and data binding. In case of loops etc. we > had find the performance difference to be in multiples.

> I was just wondering how far this development has reached and when I > would be able to use it. Planning a large rails project around Oracle > soon. Good thing I know that the existing rails work and in future we > would be able to get much better performance.

Good to see someone appreciates my work. :slight_smile: MySQL support is probably done. I was still working on PostgreSQL support. There's no Oracle support though as I don't have access to an Oracle database, but modifying the Oracle adapter should be fairly trivial. The work is currently on hold as I'm busy with other things, but I intent to resume this work in the near future.

Did I hear you live in the Enschede region? :slight_smile: I do too and would like to offer my help to implement the PostgreSQL side of things. Send me a note if you'd like to meet up sometime and I'd gladly share drinks and thoughts.