Possible solution to multiple connections

Here's an outline for a solution for supporting multiple connections to duplicate databases. I'm not an expert in production environments, so whilst the tutorial works, I'm not sure how it will play out in "the real world".

I've pinged Alex @ twitter so hopefully he'll have some feedback too.

http://drnicwilliams.com/2007/04/12/magic-multi-connections-a-facility-in-rails-to-talk-to-more-than-one-database-at-a-time/

I like the idea behind this a lot. Is there any reason this couldn't be implemented without requiring changes to the way models are called? Right now the magic_multi_connections has you do conn::Person.find(:all). Couldn't you just use your random clone selection during ActiveRecord's 'execute' method?

I haven't dug too deep in your code but I'm curious if there's a reason this would be tough.

Great idea BTW :slight_smile:

::J. Danger

I’m lazy - he’s a cut+paste of an answer from the comments of the article:

"that would be great if you didn’t care which requests went to which connection. I think some apps might want write requests to go to the master DB, and read requests (select) to be distributed to the slaves.

“Though perhaps we could use your idea and pass a :connection option in the crud methods; or perhaps :connection_group.”

I might investigate implementing these, just for laughs and giggles.

I'm not really feeling it. Connecting to multiple database instances to provide greater database scaling should be transparent. It seems to me that *that* was Alex's biggest gripe. I'd rather see the transparency either a.) at the adapter level or b.) through the use of load balancing (load balancing is great for non-HTTP traffic).

If you are trying, on the other hand, to connect different models to different databases, or even the same model to different databases at different times, then there are already patterns to do that (which you know.)

V/r Anthony

I agree that it should be transparent.

Perhaps the following database.yml syntax cleanly describes the desired usage of a hidden connection pool with different CRUD operations for an assumed one master-many slaves setup (one master only being a subset of this).

production:   adapter: mysql   ...   read_only:     clone1:       ...     clone2:       ...

We'd then use cloneN for select calls if read_only was specified.

If people think this is nice syntax for specifying multiple connections, I'll go an implement it.

Nic

Dr Nic wrote:

production:   adapter: mysql   ...   read_only:     clone1:       ...     clone2:       ...

This is nice, I'm also wondering... how hard would it be to do the connection pool trick for multi master?

If that has any sense from a load balancing pov (I think it does for highly write-intensive apps) and it is manageable in the code, then I'd like to be able specify the role (write/read/both) for each node. E.g. production:    adapter: mysql    cluster:      node1:        host: X.Y.Z.1        role: write      node2:        host: X.Y.Z.2        role: read      node3:        host: X.Y.Z.3        role: read

or:

production:    adapter: mysql    cluster:      node1:        host: X.Y.Z.1        role: readwrite      node2:        host: X.Y.Z.2        role: readwrite      node3:        host: X.Y.Z.3        role: readwrite

Luca

Just to retort my own idea - I don't know all the permutations that uber-production cluster setups use, nor do I know people that use them. So the chance of me making an arse of this multiple connections business is high.

Nonetheless, it occurred to me that some slaves might be read-write, so some syntax for that:

production:   adapter: mysql    ...    read_only:      clone1:        ...      clone2:        ...    read_write:      master1:        ...      master2:        ...

There is also the case where you have federated data, so you'd have: db1: users 1 - 100k db2: users 101k - 200k db3: users 201k - 300k etc

I don't think you can possibly handle all the different cluster setups out there. Just as long as the developer can manually pick the connection in the code if need be, it will be clean for those with simple setups (rw to master, read-only slaves), and a uglier but manageable for the massive, extremely complicated setups.

- Rob

Yeah I think any syntax should be ok for implementation - it'll get mapped into two lists: read and write; so the read/write connections will be in both.

Thought: given that concurrency in Rails is obtained with additional processes, each Rails instance/Mongrel probably doesn't need a connection to all DBs. But might be over-assumptive.

Inserts + Updates in rails are followed by fetches to retrieve ids. So, having write_only databases might get ugly.

Nic

Dr Nic wrote:

Thought: given that concurrency in Rails is obtained with additional processes, each Rails instance/Mongrel probably doesn't need a connection to all DBs. But might be over-assumptive.

might it be solved deploying different database.yml for different processes/machines?

Inserts + Updates in rails are followed by fetches to retrieve ids. So, having write_only databases might get ugly.

ugh I didn't think of it :frowning:

Luca

Rob Sanheim wrote:

There is also the case where you have federated data, so you'd have: db1: users 1 - 100k db2: users 101k - 200k db3: users 201k - 300k etc

hmm federated data will need special treatment if cross-node relations are allowed, i may be naive but it seems that things can get tricky soon.

I do absolutely agree though with what you say below, that is should be at least possible for the developers to address a specific connection.

I don't think you can possibly handle all the different cluster setups out there. Just as long as the developer can manually pick the connection in the code if need be, it will be clean for those with simple setups (rw to master, read-only slaves), and a uglier but manageable for the massive, extremely complicated setups.

yes it should be trivial for simple setups and doable for data-partitioned clusters

Luca

I really enjoy seeing you all getting your geek on, but isn't it a lot easier to just use a MySQL cluster?

   “Standard MySQL clients: These are no different for MySQL Cluster than they are for standard (non-Cluster) MySQL. In other words, MySQL Cluster can be accessed from existing MySQL applications written in PHP, Perl, C, C++, Java, Python, Ruby, and so on.”

   Manfred

Ignoring Manfred’s superb suggestion for a moment to continue to exercise geekiness…

Rob Sanheim and I chatted and resolved that an initial solution would be restricted to read-write databases only, no read-only slaves to guarantee you had access to immediately recently insert/updated data. Perhaps this isn’t necessary but it seemed a decent initial restriction.

Back to Manfred seemingly accurate dismal of the whole idea of multiple connections for the purpose of master-slave databases.

So if twitter’s DB connection issues can be solved with MySQL Clusters, what is their DB problem then?

Nic

Quite true. If you are using strictly MySQL I agree. That said, however, I'm liking Nic's ideas as they are trying to solve the issue of connecting to any combination of Oracle / Postgres / MySQL / ... databases very nicely.

I think the "best" approach might be to simply have a hook into the connection selection method using a block that is yielded certain "interesting" information about the pending operation and is expected to return the name of the connection to use for the operation. The ability to specify a global default and override it on specific models would be nice, too. In general, I think that for anyone who /needs/ this type of behavior, such a setup would not be considered too complicated; and it means you don't have to decide on a least common denominator for the various situations that one might run in to. What it provides is a central place to configure the behavior, as well as a convention.

Val Aleksenko is about to release a plugin to support the read-only database.yml syntax discussed earlier.

The rubyforge project has been finally approved (took them longer this time). The plugin README: http://revolutiononrails.blogspot.com/2007/04/plugin-release-actsasreadonlyable.html

Excellent. I’m sending everyone to you, and to MySQL clusters. :slight_smile:

Cheers Nic

One point to make about MySQL Clusters is that they may not be the silver bullet people think they are as they uses the NDB storage engine which is in-memory (as opposed to MyISAM and InnoDB), thus each cluster node needs enough memory to hold all of the database in main memory. That gets more expensive the bigger your database grows.

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-overview.html

Paul.

If you're trying to scale your database you're likely to already have the whole thing in memory anyway.

Kind regards, Thijs