Prepared statements (PostgreSQL)

Bashed this together after reading Ola Bini's post about his annoyances about the MySQL centric approach of ActiveRecord at

I decided to find out if it was really that bad, and I suppose it might be. I spent some time trying to get the type information about columns from ActiveRecord into the driver, but I couldn't come up with a good way to do it.

So I started over and wrote this query cache thingie into the PostgreSQL driver. It's a giant hack, ugly and buggy, but it passes all ActiveRecord tests except one (where it results in two queries being made, one for the prepare and another for the execute, instead of a single query doing select) and I managed to try it out on a site of mine in development mode.

With prepared statements I got: Completed in 1.59062 (0 reqs/sec) | Rendering: 1.31451 (82%) | DB: 0.25888 (16%) | 200 OK [http://192.168.0.46/characters\] Completed in 1.61538 (0 reqs/sec) | Rendering: 1.36034 (84%) | DB: 0.23845 (14%) | 200 OK [http://192.168.0.46/characters\]

Without prepared statements: Completed in 1.74804 (0 reqs/sec) | Rendering: 1.23182 (70%) | DB: 0.49718 (28%) | 200 OK [http://192.168.0.46/characters\] Completed in 1.67329 (0 reqs/sec) | Rendering: 1.21762 (72%) | DB: 0.43935 (26%) | 200 OK [http://192.168.0.46/characters\]

So that's a nice speed up, and while that might be a poor example beause of the high number of repeated selects in it, I think that prepared statements should be supported in some way by ActiveRecord, as soon as possible.

It's a giant hack and I'm actually surprised it works at all, I'm not proud of this code but it told me what I wanted to know, whether or not prepared statements really are that cool or not (turns out they are).

Index: activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb

Roderick van Domburg wrote:

Well, you sure have got an awesome proof of concept! Why don't you take it up with the maintainer of the PostgreSQL adapter or the core team? It would be great to see more of this.

Thank you! However...

If this was for the PostgreSQL adapter and only the PostgreSQL adapter, that might be worth doing, but it's more than that. Any database supporting prepared statements could do this. Let's turn it into a wrapper around all the calls to select and insert for all the databases that support it. That would be great, right? Actually, not really, for several reasons. 1) even if it was rewritten to do proper parsing of the SQL, it would still cause unnecessary overhead (possibly even more so if it did proper parsing) 2) there's no control over when the prepared statements are being made. Here I use a minimum of 3 times, then the statement is prepared. In a production setting that would mean that almost every statement is prepared, which isn't optimal. Higher limits would still get just about every query prepared, which isn't always optimal. 3) if this was solid it would still only be treating the symptoms and hiding the real problem

What real problem? ActiveRecord is, uhm... well, let's just say it's not well structured for this kind of thing (I'd love to be proven wrong on this) and needs to be refactored, and rewritten in large parts, if this was to be done cleanly. SQL statements are built in several parts, piece for piece, and no information is saved about what actually goes into it. That's why we need to parse the SQL string to find things we should already know. ActiveRecord made that SQL string, so ActiveRecord should know what variables, tables and values are in there, and we should be able to use that to do useful things like prepare/execute before it hits the adapter. Adapters are stupid, and they should be. I think ActiveRecord is stupid because it's throwing all this information away, and it shouldn't be.

Ideally ActiveRecord would use a slightly more abstract format for queries that then gets turned into SQL in one place... and while I don't know a lot about ActiveResource, standing on the outside and looking in, it seems to me like it should be an ActiveRecord with different storage. A method like 'delete' shouldn't build a SQL string, it should send a delete message to the object's storage object, whether it's REST or SQL or something else (file based storage could have some uses I bet). Or implementing caching for all "ActiveObjects", which should also be done above the SQL/REST stuff, there's no way to do that either from what I can tell.

I bet I'm going to have to regret that last part. :slight_smile: Anyways, the main focus is and should be that ActiveRecord isn't very well structured for implementing features like prepare/execute, and until it is, I can't see how it can be done well. And if we can't do it well, we might as well not do it at all (unless it's for testing or research, like this was). If I'm wrong and there is a good way to implement it, it should be made a top priority for 2.0 or 2.1, because this is more important that providing even more helpers. People like to call Ruby on Rails slow, and the defenders like to say that most of the time is spent in the database anyway, yet here we could get a huge database increase and yet it's still not done. Someone posted a couple of weeks ago about how he had been called up by the DBA and was told to shut down his Rails application because it caused too much load on the database, and 80% of the time was spent preparing statements. I don't think there's any good excuse for this, and it's time to do something about it, but this here isn't the way to go.

In my humble opinion. :wink: Probably came across as pompous bastard there, which wasn't my intention. I'd love to help working on a real solution to this problem but I don't know where to start (and even if I did, I couldn't do it alone, and if I did it wouldn't be accepted anyways). I'm trying to show how useful prepared statements can be to hopefully bring some more attention to this.

Joel, good input all around. Not to punt on this, but I fear that refactoring will turn into a class-for-every-noun design spree/quagmire. AR works really well for its common case; we don’t need to solve the non-problems that some ORMs relish.

However, it’s getting creaky in spots and could use some love. A good example is how anna@wota.jp reworked eager associations: http://dev.rubyonrails.org/ticket/3913

jeremy

I'm with you ... but it's about #10 on my list right now.

You've got some good ideas here. I agree that we need to get prepared statements in AR.

I just spent about 20 minutes working on this proof of concept. It's really ugly, but it shows a different approach to using PS.

Here's how it works. First you extend AR::Base with the module below. Then your class could look like: class Person < ActiveRecord::Base   prepare :find_by_name end

When the Person class is loaded, it will build a prepared statement and execute it against the database. Then it defines find_by_name to execute the PS instead of building a standard SQL query.

The main advantage that I can see is that the prepared statement is explicit. Instead of trying to figure out if it should prepare a statement after a certain number of queries, you can just say define a particular method to use a prepared statement.

My implementation blows (it's at the bottom of this post), but as I said it was just a quick proof of concept. I'm not super well-versed in AR, so I don't know what exactly I can use. Ideally I'd like to somehow get a string that represents the SQL that AR would generate on its own, then use that to create the prepared statement.

I think it could be implemented to handle inserts easily.

There are some issues that need to be handled, like making sure not to prepare an exising PS, or making sure one exists before executing it. But those are of course things that can come later if this seems like a good approach. Also I think some of this probably belongs in the connector, though I'm not entirely sure what parts.

What do you think of this approach? Is it worth pursuing?

Pat

module PreparedPostgres   def prepare(method_id)     if match = /find_(by)_([_a-zA-Z]\w*)/.match(method_id.to_s)       finder = determine_finder(match)       attribute_names = extract_attribute_names_from_match(match)       super unless all_attributes_exists?(attribute_names)

      cols = columns.select { |c| attribute_names.include?(c.name) }       col_types = cols.collect { |c| ps_type c.type }       conds =       cols.each { |c| conds << "#{c.name}=$#{conds.size + 1}" }       prep = "PREPARE #{method_id.to_s} (#{col_types.join(", ")}) AS " +              "SELECT * FROM #{table_name} WHERE #{conds.join(" AND ")};"       connection.execute prep;

      (class << self; self; end).instance_eval do         define_method(method_id) do |*args|           qargs = args.collect { |a| quote a }           statement = "EXECUTE #{method_id.to_s}(#{qargs.join(", ")});"           find_by_sql(statement).first;         end       end     end   end

  private   def ps_type(type)     case type     when :integer then :int     when :string then :text     end   end end

I am glad to see that in the last week we have gone from "we should do something" to "here's a proof of concept."

Here's a couple considerations and implications:

1. Only the database connector knows (or should be responsible for knowing) whether or not the underlying RDBMS supports prepared statements.

2. ActiveRecord needs to be the one responsible for determining whether or not a prepared statement exists. Otherwise it will generate a lot of unneeded sql strings that are subsequently ignored in favor of the prepared statements.

The implication is that the availability of prepared statements must be exposed as a standardized (boolean) property of the RDBMS connector, and then the ActiveRecord can ignore (current behavior) or use it later as it matures.

If a mechanism for generating a "footprint" for a prepared statements can be determined, that is more efficient than generating an SQL string, then we are in very good shape.

The "find_by_id" idiom is a good trivial case, reflecting 90% of SQL in my own rails apps. However, it is incomplete.

A trivial footprint consists of the SQL from the "from" clause to the end of the statement, prior to sanitize/populate parameters. The code for this pre-exists, and as a String it provides a good key into a hash map. Once the "from" and "where" clause are computed, the actual prepared statement could be obtained by something along these lines

    # This is entirely pseudocode. Any resemblance to Ruby is     # randomly intentional but entirely unreliable

    footprint=... # existing "from ... where ..." code     if connector.supports_prepared_statement then       stmt=stmt_map[footprint]       # populate parameters ...       ...       # execute query       result = stmt.execute     end     else         # existing code     end

Unless specifically noted (like in find_by_sql), prepared statement queries would return all columns from all addressed tables. find_by_sql should use the full sql as a footprint.