Prepared statements (PostgreSQL)

Bashed this together after reading Ola Bini's post about his annoyances
about the MySQL centric approach of ActiveRecord at
http://ola-bini.blogspot.com/2006/09/rails-databases-activerecord-and-path.html

I decided to find out if it was really that bad, and I suppose it might
be. I spent some time trying to get the type information about columns
from ActiveRecord into the driver, but I couldn't come up with a good
way to do it.

So I started over and wrote this query cache thingie into the
PostgreSQL driver. It's a giant hack, ugly and buggy, but it passes all
ActiveRecord tests except one (where it results in two queries being
made, one for the prepare and another for the execute, instead of a
single query doing select) and I managed to try it out on a site of
mine in development mode.

With prepared statements I got:
Completed in 1.59062 (0 reqs/sec) | Rendering: 1.31451 (82%) | DB:
0.25888 (16%) | 200 OK [http://192.168.0.46/characters]
Completed in 1.61538 (0 reqs/sec) | Rendering: 1.36034 (84%) | DB:
0.23845 (14%) | 200 OK [http://192.168.0.46/characters]

Without prepared statements:
Completed in 1.74804 (0 reqs/sec) | Rendering: 1.23182 (70%) | DB:
0.49718 (28%) | 200 OK [http://192.168.0.46/characters]
Completed in 1.67329 (0 reqs/sec) | Rendering: 1.21762 (72%) | DB:
0.43935 (26%) | 200 OK [http://192.168.0.46/characters]

So that's a nice speed up, and while that might be a poor example
beause of the high number of repeated selects in it, I think that
prepared statements should be supported in some way by ActiveRecord, as
soon as possible.

It's a giant hack and I'm actually surprised it works at all, I'm not
proud of this code but it told me what I wanted to know, whether or not
prepared statements really are that cool or not (turns out they are).

Index:
activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb

Roderick van Domburg wrote:

Well, you sure have got an awesome proof of concept! Why don't you take
it up with the maintainer of the PostgreSQL adapter or the core team? It
would be great to see more of this.

Thank you! However...

If this was for the PostgreSQL adapter and only the PostgreSQL adapter,
that might be worth doing, but it's more than that. Any database
supporting prepared statements could do this.
Let's turn it into a wrapper around all the calls to select and insert
for all the databases that support it. That would be great, right?
Actually, not really, for several reasons.
1) even if it was rewritten to do proper parsing of the SQL, it would
still cause unnecessary overhead (possibly even more so if it did
proper parsing)
2) there's no control over when the prepared statements are being made.
Here I use a minimum of 3 times, then the statement is prepared. In a
production setting that would mean that almost every statement is
prepared, which isn't optimal. Higher limits would still get just about
every query prepared, which isn't always optimal.
3) if this was solid it would still only be treating the symptoms and
hiding the real problem

What real problem? ActiveRecord is, uhm... well, let's just say it's
not well structured for this kind of thing (I'd love to be proven wrong
on this) and needs to be refactored, and rewritten in large parts, if
this was to be done cleanly. SQL statements are built in several parts,
piece for piece, and no information is saved about what actually goes
into it.
That's why we need to parse the SQL string to find things we should
already know. ActiveRecord made that SQL string, so ActiveRecord should
know what variables, tables and values are in there, and we should be
able to use that to do useful things like prepare/execute before it
hits the adapter.
Adapters are stupid, and they should be. I think ActiveRecord is stupid
because it's throwing all this information away, and it shouldn't be.

Ideally ActiveRecord would use a slightly more abstract format for
queries that then gets turned into SQL in one place... and while I
don't know a lot about ActiveResource, standing on the outside and
looking in, it seems to me like it should be an ActiveRecord with
different storage.
A method like 'delete' shouldn't build a SQL string, it should send a
delete message to the object's storage object, whether it's REST or SQL
or something else (file based storage could have some uses I bet).
Or implementing caching for all "ActiveObjects", which should also be
done above the SQL/REST stuff, there's no way to do that either from
what I can tell.

I bet I'm going to have to regret that last part. :slight_smile:
Anyways, the main focus is and should be that ActiveRecord isn't very
well structured for implementing features like prepare/execute, and
until it is, I can't see how it can be done well. And if we can't do it
well, we might as well not do it at all (unless it's for testing or
research, like this was).
If I'm wrong and there is a good way to implement it, it should be made
a top priority for 2.0 or 2.1, because this is more important that
providing even more helpers.
People like to call Ruby on Rails slow, and the defenders like to say
that most of the time is spent in the database anyway, yet here we
could get a huge database increase and yet it's still not done. Someone
posted a couple of weeks ago about how he had been called up by the DBA
and was told to shut down his Rails application because it caused too
much load on the database, and 80% of the time was spent preparing
statements.
I don't think there's any good excuse for this, and it's time to do
something about it, but this here isn't the way to go.

In my humble opinion. :wink:
Probably came across as pompous bastard there, which wasn't my
intention. I'd love to help working on a real solution to this problem
but I don't know where to start (and even if I did, I couldn't do it
alone, and if I did it wouldn't be accepted anyways). I'm trying to
show how useful prepared statements can be to hopefully bring some more
attention to this.

Joel, good input all around. Not to punt on this, but I fear that refactoring will turn into a class-for-every-noun design spree/quagmire. AR works really well for its common case; we don’t need to solve the non-problems that some ORMs relish.

However, it’s getting creaky in spots and could use some love. A good example is how anna@wota.jp reworked eager associations:

http://dev.rubyonrails.org/ticket/3913

jeremy

I'm with you ... but it's about #10 on my list right now.

You've got some good ideas here. I agree that we need to get prepared
statements in AR.

I just spent about 20 minutes working on this proof of concept. It's
really ugly, but it shows a different approach to using PS.

Here's how it works. First you extend AR::Base with the module below.
Then your class could look like:
class Person < ActiveRecord::Base
  prepare :find_by_name
end

When the Person class is loaded, it will build a prepared statement
and execute it against the database. Then it defines find_by_name to
execute the PS instead of building a standard SQL query.

The main advantage that I can see is that the prepared statement is
explicit. Instead of trying to figure out if it should prepare a
statement after a certain number of queries, you can just say define a
particular method to use a prepared statement.

My implementation blows (it's at the bottom of this post), but as I
said it was just a quick proof of concept. I'm not super well-versed
in AR, so I don't know what exactly I can use. Ideally I'd like to
somehow get a string that represents the SQL that AR would generate on
its own, then use that to create the prepared statement.

I think it could be implemented to handle inserts easily.

There are some issues that need to be handled, like making sure not to
prepare an exising PS, or making sure one exists before executing it.
But those are of course things that can come later if this seems like
a good approach. Also I think some of this probably belongs in the
connector, though I'm not entirely sure what parts.

What do you think of this approach? Is it worth pursuing?

Pat

module PreparedPostgres
  def prepare(method_id)
    if match = /find_(by)_([_a-zA-Z]\w*)/.match(method_id.to_s)
      finder = determine_finder(match)
      attribute_names = extract_attribute_names_from_match(match)
      super unless all_attributes_exists?(attribute_names)

      cols = columns.select { |c| attribute_names.include?(c.name) }
      col_types = cols.collect { |c| ps_type c.type }
      conds = []
      cols.each { |c| conds << "#{c.name}=$#{conds.size + 1}" }
      prep = "PREPARE #{method_id.to_s} (#{col_types.join(", ")}) AS " +
             "SELECT * FROM #{table_name} WHERE #{conds.join(" AND ")};"
      connection.execute prep;

      (class << self; self; end).instance_eval do
        define_method(method_id) do |*args|
          qargs = args.collect { |a| quote a }
          statement = "EXECUTE #{method_id.to_s}(#{qargs.join(", ")});"
          find_by_sql(statement).first;
        end
      end
    end
  end

  private
  def ps_type(type)
    case type
    when :integer then :int
    when :string then :text
    end
  end
end

I am glad to see that in the last week we have gone from "we should do
something" to "here's a proof of concept."

Here's a couple considerations and implications:

1. Only the database connector knows (or should be responsible for knowing)
whether or not the underlying RDBMS supports prepared statements.

2. ActiveRecord needs to be the one responsible for determining whether or not
a prepared statement exists. Otherwise it will generate a lot of unneeded
sql strings that are subsequently ignored in favor of the prepared
statements.

The implication is that the availability of prepared statements must be
exposed as a standardized (boolean) property of the RDBMS connector, and then
the ActiveRecord can ignore (current behavior) or use it later as it matures.

If a mechanism for generating a "footprint" for a prepared statements can be
determined, that is more efficient than generating an SQL string, then we are
in very good shape.

The "find_by_id" idiom is a good trivial case, reflecting 90% of SQL in my own
rails apps. However, it is incomplete.

A trivial footprint consists of the SQL from the "from" clause to the end of
the statement, prior to sanitize/populate parameters. The code for this
pre-exists, and as a String it provides a good key into a hash map. Once
the "from" and "where" clause are computed, the actual prepared statement
could be obtained by something along these lines

    # This is entirely pseudocode. Any resemblance to Ruby is
    # randomly intentional but entirely unreliable

    footprint=... # existing "from ... where ..." code
    if connector.supports_prepared_statement then
      stmt=stmt_map[footprint]
      # populate parameters ...
      ...
      # execute query
      result = stmt.execute
    end
    else
        # existing code
    end

Unless specifically noted (like in find_by_sql), prepared statement queries
would return all columns from all addressed tables. find_by_sql should use
the full sql as a footprint.