I'm convinced this can be done in a simple and effective manner.
I'm sure it can. Based on your comments you seems to be going for a coarser-grained solution compared to the problem that I'm trying to solve.
Doing this through DNS implies a number of things:
1. You can wait for DNS propagation (assuming you're talking the Internet and not an Intranet). 2. You have the liberty to create a separate application cluster per customer (all using essentially the same code base, with a config per customer).
Assuming this is true for you then, I would agree that you can do this pretty simply.
My situation is different.
I need quick provisioning for new customers (on the order of seconds to a couple of minutes), and I cannot consider automatic provisioning of a new application cluster each time for a new customer.
In my situation I have to be able to share application clusters (running on many machines) with a number of databases on the back end.
To CWK's points:
I'm not thinking that building a multiple tenant by breaking up the DB by tenant is trivial in my case, but it is also not as dire as you portray is. But I generally agree with you: I would like to have everything in one DB for maintenance, but reality is forcing my hand.
Firstly: the application I'm working on is a live Internet application and is already database limited. Performance of the middleware is not even remotely a factor. Our Web servers are basically asleep. So my primary concern is scaling the DB layer.
We've already investigated a number of possible cluster/federation schemes and they do not scale nearly as well as the vendors would like you to believe. In our tests data partioning per customer has given, by far, the best overall performance boost.
My comment about big number is this: No matter what your DB solution is there is some number of aggregate rows in the DB where performance will diminish "quickly". Generally speaking, you go about and index your data to get better (read) performance, but indexing provides the maximal benefit when either the indexes of all your hot tables can fit in RAM or result in very few hits to disk. However, as time goes on fewer and fewer of your indexes will fit in RAM, and even your less hot tables and indexes become significant. Your DB starts to become disk bound. "Disk bound" is one foot in the grave. So what to do? I can't ignore it.
Right now we have a logically partitioned customer data in the same DB. All living cozy inside the same schema. This does make many things easy, but it makes scaling VERY hard.
At large numbers things start to behave very differently as off-the-shelf solutions kind of cease to work, period.
Well, I think that you would agree that this is hyperbole. Off the shelf solutions, RDBMSes you mean, are clearly not the fastest things on the planet, but they keep the data mailable, I'm not aware of alternatives that have both the relative ease of data manipulation of RDBMSes and the reasonably good performance they posses. (Ah, perhaps Google has some goodies in-house, alas, I'm not Google)
Keeping RDBMSes operating at a healthy level can be done for a long time, but you will eventually need to give up some comfort. In this case the "all-in-one-db" approach.
This is a !@#$-load of tricky plumbing to avoid setting and watching client entitlements on database rows. Let alone the havoc this could wreak with managing the database--depending on which one you use, this could complicate how you deal with tablespaces and such.
No doubt that there is some plumbing that needs to be put into place. But I'm doing this to gain performance primarily. I gain the performance on two levels: the DB server in question is operating on relatively "small" DBs meaning intrinsically improved performance, then there are logical performance improvements I can make. Right now I have to do checks on security for the "owners" as well another users (our application allows clients to publish their data to other users of our system ans well as the general public). Once an owner (or assistants, who have the same privileges as the owner) has logged in I need perform no checks on access to their data.
> - Incremental application migration
This is beneficial if you want to maintain multiple software versions. If that's the case then you might as well just install complete app instances per client and be done with it. Been there many times, will never do it again unless building something like an ERP where it still makes some sense.
Agreed, this can be a pain. I wasn't referring maintaining an arbitrary number of versions of the app, just no more than two at once: old and new. This situation would only be temporary, while a roll-out was occurring.
> - Overall better performance
TANSTAAFL. If your database server is running twenty database instances, there is going to be some kind of performance hit to that versus one DB with tables 20 times larger. The overhead associated with connection pools and query caches et. al. could in many cases be much larger than the hit to scanning tables 20 times longer. I just don't accept this as an open-shut benefit right off the bat.
The benefit come from the fact that I can 1 or more DB instances on a given DB server. How I want to tune performance is entire up toy me. In fact what ever tuning trick you can do in a single Db instance to gain performance I can do with a one DB per customer setup, but the converse is not true: there are things that can be done in a one DB per customer configuration that cannot be done in a single DB approach. I'm not saying that I can accomplish this easily, but it can be done.
> - The ability to manage performance better (one big hot client can > be moved to their own Db server)
There's no reason you can't do this with a multi-tenant system too. For that matter you can run a special client on their own complete system instance with no or very little fancy plumbing.
In my case the DB, not the web application is the problem. Otherwise, yes, I agree with you.
> The DB dumper is something that has to be maintained! You're not > getting out of the fact that you will have to do work to make it > seem like each tenant is an island.
Snap response: Implement some kind of to_sql method which can be called recursively through the object tree, starting with the root object representing a client. For all I know facilities for this already exist within ActiveRecord which after all has to know how to generate SQL. Or just serialize stuff into yaml, or something like that.
Not to mention that you may find (as I did) that clients want/like human-readable backups, not SQL dumps.
But with a per client DB approach *I* get the ability to backup and restore data in a per client basis, a far more regular occurrence. And I can do this using high-performance tools without writing anything (except my app to work like this). As far as the "human" readable part I could also write such a script. Also, in case you haven't tried serializing with Ruby is REAL SLOW, loading the data with Ruby is no race winner either. Clearly no one has to be confined to using Ruby to do this. But yet again the per client DB approach wins for flexibility out of the box.
I still think the "not trivial" aspect understates it by two-thirds.
Fair enough.
It seems to me like you're building a unique configuration that will end up having a lot more dependencies on the versions of the framework, O/S, database config, etc. than is obvious from this vantage point. The end result could be that every time you do a major rev of any piece, you risk the whole thing falling apart and being the one guy in the world with that specific problem, and needing to stay on MySQL 3.1 for a year aftr its release until the low-priroity bug gets fixed. Yeah, I know it's a hypothetical, but it's the kind of hypothetical that's bitten me in the rear multiple times. The all-in-one approach has been by far the easiest to maintain and operate of all the approaches I've been involved with.
Possibility, but I generally doubt it.
Well like I said above I agree that it poses certain challenges--you end up needing to build a high-performance application even though all your customers are 5-seat installations. I do agree that this is ultimately probably an issue best solved in the database, but I'm not sure that the approach posited here isn't trading getting stabbed for getting shot.
In my case (unlike Neil) I'm doing this explicitly for the performance benefits I can attain.
Your warnings have been heard, I will take them into account. Much appreciated.
Jim Powers