postgresql vs mysql ?

Hello Nick.

I'm also a huge PostgreSQL fan, but have found that MySQL has a gigantic advantage in replication functionality -vs- PostgreSQL. The only built-in support for this is is log-shipping, which doesn't allow the replica to be read from, which is very unfortunate. We make backups via replicas, for instance, so that primary database performance is not impacted while the backups are proceeding.

MySQL Cluster is a very weird beast. It's honestly not really MySQL, but just uses the SQL parsing engine to connect into a radically different back end. I read the MySQL Press book on MySQL cluster and was amused when a "disadvantage" of MySQL Cluster was (paraphrased): You cannot alter the schema while the database is online with statements such as ALTER TABLE ADD COLUMN, and ALTER TABLE CREATE INDEX. Who in the world has an application stable enough to accept that limitation? :slight_smile:

In the end, I'd *highly* recommend that you table performance for the time being. Write the application as quickly and efficiently as possible. Spend time on tests and careful factoring rather than performance concerns. You'll have time and money when performance becomes an issue, and with tests and well factored code, you'll have the foundation required to make whatever changes are required to achieve the performance you desire.

MySQL Cluster is a very weird beast. It's honestly not really MySQL, but

seconded :slight_smile:

just uses the SQL parsing engine to connect into a radically different back end. I read the MySQL Press book on MySQL cluster and was amused when a "disadvantage" of MySQL Cluster was (paraphrased): You cannot alter the schema while the database is online with statements such as ALTER TABLE ADD COLUMN, and ALTER TABLE CREATE INDEX. Who in the world has an application stable enough to accept that limitation? :slight_smile:

It's a bit worse than this though... you can't add new tables either. Which gets annoying if you add a new feature to your site and that feature needs a new table... you have to put it into "single node" mode. We probably do this once a month...

Now, one thing that the docs don't (or didn't) make very clear is that "online" and "offline" isn't what you normally think of in this case. "single node" is really just that. All the nodes but one go offline, so you can still serve up content -- as long as that single node can handle the load.

There are other issues to consider when looking at mysql cluster though... at least with 5.0 it's all ram based so if you have a huge dataset, keep that in mind. There is also a 32,000 "affected rows" limitation. You can't update/delete/insert more than 32,000 rows at a time. I've found many times it's actually much less (more like 10,000). Which isn't a problem until you try to do a dump/restore and your dump uses extended inserts. The other big one is that there are no varchars in mysql cluster. Everything is padded out. And there's a row length limit, but I don't recall what it is. Also you can't do "SELECT ... FROM ... WHERE a LIKE ... OR b LIKE" (ie. OR'd LIKEs). And some other odd quirks.

I'm just listing these out cause if you're gonna decide now, better to know all the various gotchas before you get too far down that road :confused:

Philip Hallstrom wrote:

There are other issues to consider when looking at mysql cluster though... at least with 5.0 it's all ram based so if you have a huge dataset, keep that in mind. There is also a 32,000 "affected rows" limitation. You can't update/delete/insert more than 32,000 rows at a time. I've found many times it's actually much less (more like 10,000). Which isn't a problem until you try to do a dump/restore and your dump uses extended inserts. The other big one is that there are no varchars in mysql cluster. Everything is padded out. And there's a row length limit, but I don't recall what it is. Also you can't do "SELECT ... FROM ... WHERE a LIKE ... OR b LIKE" (ie. OR'd LIKEs). And some other odd quirks.

I'm just listing these out cause if you're gonna decide now, better to know all the various gotchas before you get too far down that road :confused:

and about a postgresql cluster what do you think?

Nothing... I've never had the chance to use it. And it's been a long time since I looked at slony which is the only free implementation I believe (I could be totally wrong). There's at least one company that has a commercial version...

Also, look into pgpool. It pretends it's a postgresql database and can send selects to a pool of backend servers and send all inserts/updates/deletes to a single backend server.

Seems like that would come in very handy with rails...

but again, I haven't used that either :slight_smile: