Possible solution to multiple connections

Dr_Nic · April 12, 2007, 9:38pm

Here's an outline for a solution for supporting multiple connections to duplicate databases. I'm not an expert in production environments, so whilst the tutorial works, I'm not sure how it will play out in "the real world".

I've pinged Alex @ twitter so hopefully he'll have some feedback too.

http://drnicwilliams.com/2007/04/12/magic-multi-connections-a-facility-in-rails-to-talk-to-more-than-one-database-at-a-time/

Danger · April 12, 2007, 10:21pm

I like the idea behind this a lot. Is there any reason this couldn't be implemented without requiring changes to the way models are called? Right now the magic_multi_connections has you do conn::Person.find(:all). Couldn't you just use your random clone selection during ActiveRecord's 'execute' method?

I haven't dug too deep in your code but I'm curious if there's a reason this would be tough.

Great idea BTW

::J. Danger

Dr_Nic · April 12, 2007, 10:26pm

I’m lazy - he’s a cut+paste of an answer from the comments of the article:

"that would be great if you didn’t care which requests went to which connection. I think some apps might want write requests to go to the master DB, and read requests (select) to be distributed to the slaves.

“Though perhaps we could use your idea and pass a :connection option in the crud methods; or perhaps :connection_group.”

I might investigate implementing these, just for laughs and giggles.

Anthony_Eden · April 12, 2007, 10:29pm

I'm not really feeling it. Connecting to multiple database instances to provide greater database scaling should be transparent. It seems to me that *that* was Alex's biggest gripe. I'd rather see the transparency either a.) at the adapter level or b.) through the use of load balancing (load balancing is great for non-HTTP traffic).

If you are trying, on the other hand, to connect different models to different databases, or even the same model to different databases at different times, then there are already patterns to do that (which you know.)

V/r Anthony

Dr_Nic · April 13, 2007, 7:04am

I agree that it should be transparent.

Perhaps the following database.yml syntax cleanly describes the desired usage of a hidden connection pool with different CRUD operations for an assumed one master-many slaves setup (one master only being a subset of this).

production: adapter: mysql ... read_only: clone1: ... clone2: ...

We'd then use cloneN for select calls if read_only was specified.

If people think this is nice syntax for specifying multiple connections, I'll go an implement it.

Nic

Luca_Mearelli · April 13, 2007, 7:35am

Dr Nic wrote:

production: adapter: mysql ... read_only: clone1: ... clone2: ...

This is nice, I'm also wondering... how hard would it be to do the connection pool trick for multi master?

If that has any sense from a load balancing pov (I think it does for highly write-intensive apps) and it is manageable in the code, then I'd like to be able specify the role (write/read/both) for each node. E.g. production: adapter: mysql cluster: node1: host: X.Y.Z.1 role: write node2: host: X.Y.Z.2 role: read node3: host: X.Y.Z.3 role: read

or:

production: adapter: mysql cluster: node1: host: X.Y.Z.1 role: readwrite node2: host: X.Y.Z.2 role: readwrite node3: host: X.Y.Z.3 role: readwrite

Luca

Dr_Nic · April 13, 2007, 7:37am

Just to retort my own idea - I don't know all the permutations that uber-production cluster setups use, nor do I know people that use them. So the chance of me making an arse of this multiple connections business is high.

Nonetheless, it occurred to me that some slaves might be read-write, so some syntax for that:

production: adapter: mysql ... read_only: clone1: ... clone2: ... read_write: master1: ... master2: ...

Rob_Sanheim · April 13, 2007, 7:42am

There is also the case where you have federated data, so you'd have: db1: users 1 - 100k db2: users 101k - 200k db3: users 201k - 300k etc

I don't think you can possibly handle all the different cluster setups out there. Just as long as the developer can manually pick the connection in the code if need be, it will be clean for those with simple setups (rw to master, read-only slaves), and a uglier but manageable for the massive, extremely complicated setups.

- Rob

Dr_Nic · April 13, 2007, 7:43am

Yeah I think any syntax should be ok for implementation - it'll get mapped into two lists: read and write; so the read/write connections will be in both.

Thought: given that concurrency in Rails is obtained with additional processes, each Rails instance/Mongrel probably doesn't need a connection to all DBs. But might be over-assumptive.

Inserts + Updates in rails are followed by fetches to retrieve ids. So, having write_only databases might get ugly.

Nic

Luca_Mearelli · April 13, 2007, 7:49am

Dr Nic wrote:

Thought: given that concurrency in Rails is obtained with additional processes, each Rails instance/Mongrel probably doesn't need a connection to all DBs. But might be over-assumptive.

might it be solved deploying different database.yml for different processes/machines?

Inserts + Updates in rails are followed by fetches to retrieve ids. So, having write_only databases might get ugly.

ugh I didn't think of it

Luca

Luca_Mearelli · April 13, 2007, 8:07am

Rob Sanheim wrote:

There is also the case where you have federated data, so you'd have: db1: users 1 - 100k db2: users 101k - 200k db3: users 201k - 300k etc

hmm federated data will need special treatment if cross-node relations are allowed, i may be naive but it seems that things can get tricky soon.

I do absolutely agree though with what you say below, that is should be at least possible for the developers to address a specific connection.

I don't think you can possibly handle all the different cluster setups out there. Just as long as the developer can manually pick the connection in the code if need be, it will be clean for those with simple setups (rw to master, read-only slaves), and a uglier but manageable for the massive, extremely complicated setups.

yes it should be trivial for simple setups and doable for data-partitioned clusters

Luca

Manfred_Stienstra · April 13, 2007, 8:55am

I really enjoy seeing you all getting your geek on, but isn't it a lot easier to just use a MySQL cluster?

“Standard MySQL clients: These are no different for MySQL Cluster than they are for standard (non-Cluster) MySQL. In other words, MySQL Cluster can be accessed from existing MySQL applications written in PHP, Perl, C, C++, Java, Python, Ruby, and so on.”

Manfred

Dr_Nic · April 13, 2007, 11:18am

Ignoring Manfred’s superb suggestion for a moment to continue to exercise geekiness…

Rob Sanheim and I chatted and resolved that an initial solution would be restricted to read-write databases only, no read-only slaves to guarantee you had access to immediately recently insert/updated data. Perhaps this isn’t necessary but it seemed a decent initial restriction.

Back to Manfred seemingly accurate dismal of the whole idea of multiple connections for the purpose of master-slave databases.

So if twitter’s DB connection issues can be solved with MySQL Clusters, what is their DB problem then?

Nic

Cogaidhean · April 14, 2007, 1:01pm

Quite true. If you are using strictly MySQL I agree. That said, however, I'm liking Nic's ideas as they are trying to solve the issue of connecting to any combination of Oracle / Postgres / MySQL / ... databases very nicely.

John_Wilger · April 14, 2007, 6:31pm

I think the "best" approach might be to simply have a hook into the connection selection method using a block that is yielded certain "interesting" information about the pending operation and is expected to return the name of the connection to use for the operation. The ability to specify a global default and override it on specific models would be nice, too. In general, I think that for anyone who /needs/ this type of behavior, such a setup would not be considered too complicated; and it means you don't have to decide on a least common denominator for the various situations that one might run in to. What it provides is a central place to configure the behavior, as well as a convention.

Dr_Nic · April 15, 2007, 5:19am

Val Aleksenko is about to release a plugin to support the read-only database.yml syntax discussed earlier.

Val · April 15, 2007, 9:02pm

The rubyforge project has been finally approved (took them longer this time). The plugin README: http://revolutiononrails.blogspot.com/2007/04/plugin-release-actsasreadonlyable.html

Dr_Nic · April 15, 2007, 9:36pm

Excellent. I’m sending everyone to you, and to MySQL clusters.

Cheers Nic

Paul_Wright · April 16, 2007, 10:30am

One point to make about MySQL Clusters is that they may not be the silver bullet people think they are as they uses the NDB storage engine which is in-memory (as opposed to MyISAM and InnoDB), thus each cluster node needs enough memory to hold all of the database in main memory. That gets more expensive the bigger your database grows.

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-overview.html

Paul.

Thijs_van_der_Vossen · April 16, 2007, 11:07am

If you're trying to scale your database you're likely to already have the whole thing in memory anyway.

Kind regards, Thijs

Topic		Replies	Views
Multiple Database Connections rubyonrails-talk	15	456	August 13, 2007
More that one db rubyonrails-talk	2	167	July 29, 2007
Dynamic Database Connections rubyonrails-talk	6	90	November 3, 2009
RE: [Rails] Re: multiple database users for rails app? rubyonrails-talk	0	92	January 23, 2007
same code different databases? rubyonrails-talk	4	88	May 22, 2009

Possible solution to multiple connections

Related topics

More Resources