My app is required to perform some user-driven CRUD operations on two
databases simultaneously. This is not the normal situation, but it
applies to a low-activity admin controller.
My current solution involves three models - DbDual, DbPrimary, and
DbSecondary. DbDual uses the other two within nested transactions,
and DbSecondary does an establish_connection to connect it to the
second database.
This seems to work fine, although it's _very_ slow. I assume the
slowness is due to the connecting each time the object springs into
existence.
So my question is, how can I make a connection (and maintain it)
indefinitely?
I've searched the web for solutions, but I'm not seeing what I think I
want. I think I just want a simple connection made during boot, and
then be able to tell a model to use that connection.
wild guessing, i think rails is single threaded and if you make a connection persistent you will eat all the resources in no time. I believe passenger loads a rails app per connection but as soon as the client it served the the app gets unloaded, but if the connection is permanent i think the app would never unload.
so maybe in rails 3 you could have a pool of activerecord instances and the rest of rails dynamically been loaded by passenger
I'm actually not using Passenger. I use a cluster of 50 mogrels
behind an Apache2. Since my app started pre Rails 2.0 (and is only
now up to 2.3.8), I'm not using connection pooling (as far as I
know ). My understanding is that each mongrel instance is
maintaining its own db connection. I figure having each instance
maintain two connections (one to each db) shouldn't be a bad thing.
As far as resources go, this app is the only thing running on a big
Sun box. I'm only using a tiny fraction of the 20GB of ram (in fact,
my biggest resource problem is pure CPU processing performance). I'm
really not a fan of Sun's large multicore "green thread" processors.
They're dog slow per core.
For what it's worth, Oracle is the database server. I don't think
that should make a difference to my solution.
class DbSecondary < ActiveRecord::Base
establish_connection :connection_name
end
This does what it says, it connects DbSecondary (and its descendants, if
any) to the named database connection. If you suspect that this doesn't
work as intended, look what the logs of your DB server say. If you see
that new connections to the DB are established for each request, make
sure that class caching is really enabled in production environment.
Regarding transactions: don't expect any coordination over and above
what you do yourself. ActiveRecord does not give you distributed
transactions. I haven't tried this myself, so big grains of salt are in
order. Let's say you have code like this
DbDual.transaction do
DbPrimary.transaction do
# do something in first database
end
DbSecondary.transaction do
# do something in second database
end
end
In execution, this code is translated to two independent transactions,
one on each of the databases. The DbPrimary transaction is mapped to a
SAVEPOINT/RELEASE SAVEPOINT pair within the DbDual transaction. The
DbSecondary transaction is completely independent. However, if this
latter transaction raises a (suitable) exception, it still aborts the
enclosing DbDual transaction in addition to the DbSecondary one.
If a failure occurs immediately after the DbSecondary.transaction is
committed, for example because the connection to the first database was
lost, the second transaction will be committed whereas the first one
will be rolled back when the DB server decides it has timed out (or
whatever else might cause it to drop the transaction).
Yes, I'm making the connection in my model as you describe. That is
working fine in the sense that it is connecting to the secondary
database.
My problem is, it appears to be connecting every time I instantiate
that object. I've traced the entire establish_connection path, and it
appears to be creating a new ConnectionPool object each time. What's
happening in establish_connection() is a little confusing to me.
As far as I can tell, there is no re-use of connections to the
secondary database.
Regarding my transactions, I'm doing it like this:
prod.transaction do
test.transaction do
begin
prod_result = prod.connection.execute(prod_sql)
test_result = test.connection.execute(test_sql)
rescue
Util.lge "/// update exception"
Util.lge "/// col_names = #{col_names.join(',')}"
Util.lge "/// values = #{values.join(',')}"
Util.lge "/// prod_sql = #{prod_sql}"
Util.lge "/// test_sql = #{test_sql}"
Util.lge "/// exception = #{$!}"
raise ActiveRecord::Rollback
end
end
end
My transaction code seems to work, but I need to test more failure
scenarios.
Something's suspicious here. The behavior of establish_connection is
correct (I think), but I don't see why it would be called more than
twice for each instance of your app. Once each for the primary and
secondary database. If establish_connection is called more often than
that in the production environment(!) you ought to investigate the
reason. Usually, class caching is enabled there and connections are
retained across requests.
Another angle of attack is to set a breakpoint (or puts caller)
somewhere deep down in the database adapter to see when and from where
connections are established.
I tested this in production mode, and it's _much_ faster. It's fast
enough that I'm almost certain that it's not reconnecting to the
secondary database each time (at least in production mode). I don't
know if it's just overhead of other development behaviors or if it's
actually doing things differently (with respect to connections/pools/
reconnects); and I don't have time to investigate deeper.
Did you originally write that your problem is with behavior in
development environment? I was assuming production. The reason for the
behavior you notice is simple: In development environment, class caching
is disabled, in other words, classes are reloaded on each request so
that changes are immediately effective. An intended consequence is that
class variables and class instance variables (i.e., instance variables
of class objects) are re-initialized again, too. Among those being the
variables that hold the database connection.
This was all in development mode. As slow as it is in development
mode, I'd never release it to production. Fortunately, for reasons
you explained above, it does not suffer the same performance problems
in production mode.
I suppose since previously I was only working with the one database,
that connection was maintained outside the scope of caching/non-
caching models. But with this new connection defined and used within
a model, the non-caching must have been causing a new connection to
happen on each use.