I'm working on long-awaited support for prepared statements in
ActiveRecord. Before I finish my work, I would like to know whether
the Ruby on Rails core team accepts my design, and whether they will
review my patch (once it is finished) seriously. I have documented the
design at: http://izumi.plan99.net/blog/?p=35
Nicely done! But am I to understand StatementBuilder code is going to replace (cleanup) all the string hacks currently involved in building SQL queries? Will the usage of prepared statements in MySQL/PostgreSQL become implicit with this? Also, did you follow the “ActiveRecord refactoring” thread [1]? Did Zach [2][3] really do it?
Nicely done! But am I to understand StatementBuilder code is going to
replace (cleanup) all the string hacks currently involved in building SQL
queries? Will the usage of prepared statements in MySQL/PostgreSQL become
implicit with this?
Thanks. The use of prepared statements in MySQL/PostgreSQL will
completely replace the argument escaping stuff. Database adapters that
don't support prepared statements (or haven't implemented support yet)
will silently fallback to argument escaping, as is done now.
Also, did you follow the "ActiveRecord refactoring"
thread [1]? Did Zach [2][3] really do it?
So, every query ever issued will become a prepared statement. Isn’t this going to lead to unnecessary overhead? I mean, there are a lot of various SELECT statements generated by even a simple Rails application. Does it really make sense to prepare every one of them? Having backends cache the execution plan for rare SELECTs can possibly have undesirable overhead.
Also, how will ActiveRecord know how to reuse these statements? For example:
a = Author.find(1)
post = Author.posts.first
post.author
The first and the last query are different to AR, yet they take a same shape (“SELECT * FROM authors WHERE authors.id = ?”) and can both be represented by a single prepared statement. With your solution, will AR know how to reuse the prepared statement in different places, or will it try to prepare it twice?
The current implementation immediately removes prepared statements
after using them once, so they will be reparsed twice. Right now I'm
trying to make it actually work and getting the framework in place.
Optimization can be done later (and I'm fairly confident that it can
be optimized).
In the future I can add caching. The cache will have a limited size,
and items can be removed based on least-recently-used or some other
criteria, similar to how CPU caches work.
Doesn’t that negate the whole idea of having prepared statements in the database in the first place? I don’t think Rails core team would accept the patch that slows down ActiveRecord just for the sake of using a database feature (in a wrong way).
Actually, no. One of the problems with quoting string manually is
excessive memory usage in Rails. For example, if you're uploading a 1
MB photo to your database, then Rails has to quote a 1 MB string,
which is very slow and results in more than 1 MB memory usage because
of string overheads. Using prepared statements will remove this
problem.
And I'm not sure that using prepared statements (and immediately
deleting them) really makes things slower. Why would it be? After all,
if you're not using prepared statements then the database will have to
reparse each statement every time anyway. This will need more
benchmarking.
The prepared statement caching that I mentioned should be trivial to
implement. It's on my todo list.
It'll be somewhat slower than just throwing a SQL statement at the
database because it adds extra synchronous communication associated
with preparing the statement. In most cases it'll be negligible.
From past experiences, I would expect to see a noticeable, if not
huge, gain from caching prepared statements. At least, with Oracle.
I’d dare to say that Oracle isn’t exactly very popular in Rails world. The majority uses Postgres and MySQL, and they would see a noticable gain only when these prepared statements were reused for multiple queries.
That's all good and well but right now I have my hands full on making
it *work* (I have a ton of unit test failures to fix). I promise you I
will take care of performance once this actually works.
Yes. In your place I would contact Zach to get an update on his progress. If he really did some refactoring like he said, it is most likely that he made some statement builder class himself. Picking up where he left off in this area would be a boost.
I wish you luck with the project! Hopefully your cleanup will finally make AR::Base (and related files) SQL-free; connection adapters and “statements builders” should handle that logic.
and they would see a noticable gain only when these prepared statements
were reused for multiple queries.
It's the same with Oracle. Prepared statement is merely a way to avoid
hitting SQL parser and query optimizer.
I've done some testing w/ Oracle, and a very significant gain comes from simply using real bind variables. Caching the prepared statements in the client doesn't matter much at all, presuming you're going to have multiple clients. But the use of real bind variables allows Oracle itself to cache the statements and avoid multiple hard parses.
Here's an update. I noticed that using prepared statements will indeed
make MySQL queries a bit slower. The unit tests in an unmodified Rails
edge source tree took 18 seconds. After adding prepared statements,
they took 28 seconds. I've been working on optimizing this since
yesterday evening, and I succeeded.
Using prepared statements is only advantageous in MySQL if the
arguments are large (i.e. uploading a large image), so the code will
now fallback to _not_ using prepared statements if it detects that no
argument is larger than 32 KB. The unit tests now take 18 seconds
again, thus eliminating the performance problem entirely. Something
similar can be implemented in other adapters.
It noticed this comment in activerecord/test/binary_test.rb:
# Without using prepared statements, it makes no sense to test
# BLOB data with SQL Server, because the length of a statement is
# limited to 8KB.
And I forgot to say this in my last email. Before I added the fallback
code, I implemented a cache for compiled prepared statements. The
cache hit rate is 65%, but it resulted in no noticeable performance
improvement, at least on MySQL.
I know this is tangential, but is there any good reason for storing
images in the database?
I can think of a few, but they're not very good;
- security (but that can be handled with http or other auth).
- logging (you want to send your images through rails so you know
the # of downloads)
- ??
I've always advised people that filesystem is preferrable (it's a
file, keep in in the file system) since you can easily back it up, and
you don't have to hit rails to get the image (sloooow, and uses up an
app listener).
If you ignore the problem of quoting 1mb strings from image uploads,
what other reason is there for prepared statements?
I know this is tangential, but is there any good reason for storing
images in the database?
I can think of a few, but they're not very good;
- security (but that can be handled with http or other auth).
- logging (you want to send your images through rails so you know
the # of downloads)
- ??
- All data is in one place. That may make backupping easier.
- Support for transactions. Suppose that you not only store the image,
but also some metadata. If I first save the image file on disk, and
then save the metadata, but the computer crashes during metadata
saving, then I'm left with a stale file that's not used. If I first
save the metadata and then saves the image file, and the computer
crashes during image file saving, then I have a corrupted image file
but intact metadata.
If you ignore the problem of quoting 1mb strings from image uploads,
what other reason is there for prepared statements?
Oracle support. Oracle lives by prepared statements, even if those
prepared statements aren't reused, as someone has pointed out. One can
argue that few Rails developers use Oracle, but I can also argue that
that's because Rails performs badly on Oracle.
Furthermore:
- My code will fallback to using "unprepared statements" if it thinks
that using prepared statements is not necessary, so performance is not
a problem.
- It moves a lot of the statement building code to separate classes.
This makes ActiveRecord::Base smaller, and thus easier to read and to
maintain.
I like the gist of your idea and approach. Thanks for working on it.
I'd start with using bound variables first, though. They're always
beneficial and are a natural stepping stone to prepared statements
which can be added later for those queries which may benefit (many do
not).
Sorry I didn't comment on your earlier postgres patch: it includes a
lot of superfluous style edits and untested changes, so I didn't
review further. I salute your effort but it could use a lot of cleanup
before committing.
Just wondering, but is there any advantage to trying to manually
manage the execution plan cache? I know MSSQL-Server for example
caches the execution-plan of every parameterized Query (anything
executed with the sp_executesql stored-procedure), which means the
majority of ADO and ADO.NET based applications, and no one on that
side of the fence would ever worry about such a thing; the server
manages that cache itself, exactly as it manages what indexes to keep
in memory at any given moment.
So I guess my question is: Is flushing the statements really necessary
or are you just working under an assumption the database (Postgres/
MySQL) will blindly allow it to grow to unreasonable levels?
Oracle support. Oracle lives by prepared statements, even if those
prepared statements aren't reused, as someone has pointed out. One can
argue that few Rails developers use Oracle, but I can also argue that
that's because Rails performs badly on Oracle.
I'd have to disagree with you there. Rails actually performs very well with Oracle. A change made a while back to default to "similar" cursor_sharing resulted in much of the benefit of bind variables. "performs badly" is definitely too strong.