Request for comments: prepared statements in ActiveRecord

I'm working on long-awaited support for prepared statements in ActiveRecord. Before I finish my work, I would like to know whether the Ruby on Rails core team accepts my design, and whether they will review my patch (once it is finished) seriously. I have documented the design at: http://izumi.plan99.net/blog/?p=35

Please comment on this. Thank you.

Nicely done! But am I to understand StatementBuilder code is going to replace (cleanup) all the string hacks currently involved in building SQL queries? Will the usage of prepared statements in MySQL/PostgreSQL become implicit with this? Also, did you follow the “ActiveRecord refactoring” thread [1]? Did Zach [2][3] really do it?

[1] http://groups.google.com/group/rubyonrails-core/browse_thread/thread/32659af615f48fd5/1d9170c2e1b05979 [2] http://www.continuousthinking.com/ [3] http://groups.google.com/groups/profile?enc_user=WZYiNRUAAADUiWRahsdnVJcZQgJLgcVW9h3i3SmjGmAJbX05nZ-8fQ

Nicely done! But am I to understand StatementBuilder code is going to replace (cleanup) all the string hacks currently involved in building SQL queries? Will the usage of prepared statements in MySQL/PostgreSQL become implicit with this?

Thanks. :slight_smile: The use of prepared statements in MySQL/PostgreSQL will completely replace the argument escaping stuff. Database adapters that don't support prepared statements (or haven't implemented support yet) will silently fallback to argument escaping, as is done now.

Also, did you follow the "ActiveRecord refactoring" thread [1]? Did Zach [2][3] really do it?

[1]http://groups.google.com/group/rubyonrails-core/browse_thread/thread/… [2]http://www.continuousthinking.com/ [3]http://groups.google.com/groups/profile?enc_user=WZYiNRUAAADUiWRahsdn

I didn't follow any of these threads, but I have read the ActiveRecord::Extensions website in the past.

This prepared statements support is entirely my own work, and is not related to Zach's work.

So, every query ever issued will become a prepared statement. Isn’t this going to lead to unnecessary overhead? I mean, there are a lot of various SELECT statements generated by even a simple Rails application. Does it really make sense to prepare every one of them? Having backends cache the execution plan for rare SELECTs can possibly have undesirable overhead.

Also, how will ActiveRecord know how to reuse these statements? For example:

a = Author.find(1) post = Author.posts.first post.author

The first and the last query are different to AR, yet they take a same shape (“SELECT * FROM authors WHERE authors.id = ?”) and can both be represented by a single prepared statement. With your solution, will AR know how to reuse the prepared statement in different places, or will it try to prepare it twice?

The current implementation immediately removes prepared statements after using them once, so they will be reparsed twice. Right now I'm trying to make it actually work and getting the framework in place. Optimization can be done later (and I'm fairly confident that it can be optimized). In the future I can add caching. The cache will have a limited size, and items can be removed based on least-recently-used or some other criteria, similar to how CPU caches work.

Doesn’t that negate the whole idea of having prepared statements in the database in the first place? I don’t think Rails core team would accept the patch that slows down ActiveRecord just for the sake of using a database feature (in a wrong way).

Actually, no. One of the problems with quoting string manually is excessive memory usage in Rails. For example, if you're uploading a 1 MB photo to your database, then Rails has to quote a 1 MB string, which is very slow and results in more than 1 MB memory usage because of string overheads. Using prepared statements will remove this problem. And I'm not sure that using prepared statements (and immediately deleting them) really makes things slower. Why would it be? After all, if you're not using prepared statements then the database will have to reparse each statement every time anyway. This will need more benchmarking. The prepared statement caching that I mentioned should be trivial to implement. It's on my todo list.

It'll be somewhat slower than just throwing a SQL statement at the database because it adds extra synchronous communication associated with preparing the statement. In most cases it'll be negligible.

From past experiences, I would expect to see a noticeable, if not huge, gain from caching prepared statements. At least, with Oracle.

I’d dare to say that Oracle isn’t exactly very popular in Rails world. The majority uses Postgres and MySQL, and they would see a noticable gain only when these prepared statements were reused for multiple queries.

I'd dare to say that Oracle isn't exactly very popular in Rails world.

It is in my corner of the woods. Let me not mention that evil E-word again :slight_smile:

and they would see a noticable gain only when these prepared statements were reused for multiple queries.

It's the same with Oracle. Prepared statement is merely a way to avoid hitting SQL parser and query optimizer.

That's all good and well but right now I have my hands full on making it *work* (I have a ton of unit test failures to fix). I promise you I will take care of performance once this actually works.

Are there any other discussion points left?

Yes. In your place I would contact Zach to get an update on his progress. If he really did some refactoring like he said, it is most likely that he made some statement builder class himself. Picking up where he left off in this area would be a boost.

I wish you luck with the project! Hopefully your cleanup will finally make AR::Base (and related files) SQL-free; connection adapters and “statements builders” should handle that logic.

Alexey Verkhovsky wrote:

and they would see a noticable gain only when these prepared statements were reused for multiple queries.

It's the same with Oracle. Prepared statement is merely a way to avoid hitting SQL parser and query optimizer.

I've done some testing w/ Oracle, and a very significant gain comes from simply using real bind variables. Caching the prepared statements in the client doesn't matter much at all, presuming you're going to have multiple clients. But the use of real bind variables allows Oracle itself to cache the statements and avoid multiple hard parses.

Here's an update. I noticed that using prepared statements will indeed make MySQL queries a bit slower. The unit tests in an unmodified Rails edge source tree took 18 seconds. After adding prepared statements, they took 28 seconds. I've been working on optimizing this since yesterday evening, and I succeeded. Using prepared statements is only advantageous in MySQL if the arguments are large (i.e. uploading a large image), so the code will now fallback to _not_ using prepared statements if it detects that no argument is larger than 32 KB. The unit tests now take 18 seconds again, thus eliminating the performance problem entirely. Something similar can be implemented in other adapters.

It noticed this comment in activerecord/test/binary_test.rb:   # Without using prepared statements, it makes no sense to test   # BLOB data with SQL Server, because the length of a statement is   # limited to 8KB.

And I forgot to say this in my last email. Before I added the fallback code, I implemented a cache for compiled prepared statements. The cache hit rate is 65%, but it resulted in no noticeable performance improvement, at least on MySQL.

I know this is tangential, but is there any good reason for storing images in the database?

I can think of a few, but they're not very good;   - security (but that can be handled with http or other auth).   - logging (you want to send your images through rails so you know the # of downloads)   - ??

I've always advised people that filesystem is preferrable (it's a file, keep in in the file system) since you can easily back it up, and you don't have to hit rails to get the image (sloooow, and uses up an app listener).

If you ignore the problem of quoting 1mb strings from image uploads, what other reason is there for prepared statements?

I know this is tangential, but is there any good reason for storing images in the database?

I can think of a few, but they're not very good;   - security (but that can be handled with http or other auth).   - logging (you want to send your images through rails so you know the # of downloads)   - ??

- All data is in one place. That may make backupping easier. - Support for transactions. Suppose that you not only store the image, but also some metadata. If I first save the image file on disk, and then save the metadata, but the computer crashes during metadata saving, then I'm left with a stale file that's not used. If I first save the metadata and then saves the image file, and the computer crashes during image file saving, then I have a corrupted image file but intact metadata.

If you ignore the problem of quoting 1mb strings from image uploads, what other reason is there for prepared statements?

Oracle support. Oracle lives by prepared statements, even if those prepared statements aren't reused, as someone has pointed out. One can argue that few Rails developers use Oracle, but I can also argue that that's because Rails performs badly on Oracle.

Furthermore: - My code will fallback to using "unprepared statements" if it thinks that using prepared statements is not necessary, so performance is not a problem. - It moves a lot of the statement building code to separate classes. This makes ActiveRecord::Base smaller, and thus easier to read and to maintain.

I like the gist of your idea and approach. Thanks for working on it.

I'd start with using bound variables first, though. They're always beneficial and are a natural stepping stone to prepared statements which can be added later for those queries which may benefit (many do not).

Sorry I didn't comment on your earlier postgres patch: it includes a lot of superfluous style edits and untested changes, so I didn't review further. I salute your effort but it could use a lot of cleanup before committing.

Are you attending RailsConf this week?

jeremy

Just wondering, but is there any advantage to trying to manually manage the execution plan cache? I know MSSQL-Server for example caches the execution-plan of every parameterized Query (anything executed with the sp_executesql stored-procedure), which means the majority of ADO and ADO.NET based applications, and no one on that side of the fence would ever worry about such a thing; the server manages that cache itself, exactly as it manages what indexes to keep in memory at any given moment.

So I guess my question is: Is flushing the statements really necessary or are you just working under an assumption the database (Postgres/ MySQL) will blindly allow it to grow to unreasonable levels?

Hongli Lai wrote:

Oracle support. Oracle lives by prepared statements, even if those prepared statements aren't reused, as someone has pointed out. One can argue that few Rails developers use Oracle, but I can also argue that that's because Rails performs badly on Oracle.

I'd have to disagree with you there. Rails actually performs very well with Oracle. A change made a while back to default to "similar" cursor_sharing resulted in much of the benefit of bind variables. "performs badly" is definitely too strong.