Database/program design question

Hi, I'm a bit new to the MVC design and all the fun stuff that goes along with Ruby on Rails. I just got RoR working on my host (I'm stuck on version 1.1.6, though), and I was looking for advice on how I should design this in my mysql database.

I am working on a system for image galleries. Essentially, I have several Users, and each user could potentially have several (has_many) picture galleries. Sort of like facebook or myspace picture galleries. My problem is this: each gallery is going to have to have it's own table (I think) because there could be a really large amount of pictures, and I'll need columns for comments, description, filename, etc. But using something from RoR that I've learned so far, I can do has_many :galleries, but then I only get 1 gallery model - linked to just one table. I guess I could set the foreign key (user_id) to each row in that table, and then store the name of the specific gallery table (gallery_kopf1988_1)... but then how do I get that gallery working in the same Model-View-Controller fashion?

I hope this makes some sense...

i don't think this makes much sense you won't get any performance by having multiple tables and it will make your code very difficult to handle and you'll have to write things multiple times or code outside the model

i would use a gallery table and pictures :belongs_to "gallery" comments (and maybe descriptions) have their own tables (:belongs_to "picture")

or can a picture belong to several galeries, too? then it would be a :has_and_belongs_to relation (since a gallery will have multiple pictures) in that case you would need to have them in one table anyway

if the galleries are per user, things become even more complicated and the approach of having your own table per gallery will fail. another problem would be, that you can't create new galleries if you need to create a table for them

Okay, so I'll have to just have every picture in the same table then, belonging to a specific gallery. I was just thinking this might not be scalable once I get to a whole lot of pictures.

So the gallery "kopf1988" will then "has_many :pictures" And then the pictures will "has_many :comments"

Okay... should be simple enough. And I can even use the comments model for the comments that I'll have in other parts of the website, so that could keep things DRYer.

Let's hope I remember all this by the time I get home.

Don't worry about scalability until you have to. Odds are, you won't. You should be so lucky!

///ark

This topic is kind of fortuitous for me- I've actually been looking for an answer to something like this, through the archives of this group and with Google. I understand what you're saying about optimizing prematurely, but I have been that lucky before, and it was, contrary to the saw, not a nice problem to have, at least for those who had to deal with it. I worked on a site where people hadn't really worried about scalability, and it bit us a bit when the site's traffic increased a thousandfold over a very short period of time. So I think there's a balance to be struck there. One thing I learned from that experience is that some things are often pretty easy to deal with- using a poor algorithm, if it can just be rewritten, or failing to cache as much as you should, for instance. But once you have actual data it can be hard to deal with problems with db architecture cleanly and quickly.

I'm wondering about a situation in which you're offering an application to the public, letting anyone sign up to use it, with what is conceptually their own instance of it. Take something like Blogger, just as an example. It's pretty easy to map out a simple data design for an individual blog... if you need to host tens of thousands of them it is trickier. Let's say that blogs have posts, and posts have comments. One way to handle this is to stick all of the comments into one large table with a foreign key that references posts (and do the same for posts, relative to blogs). If you tune the db well, and have enough RAM in your machine(s) to hold the entire dataset, this is likely to scale pretty well for quite a while, but I think at some point you may hit a wall with this approach- after all, if the average post has 100 comments, and the average blog has 1000 posts, and there are 10,000 blogs you're talking about 1,000,000,000 records (which, of course, would be a nice problem to have :wink: ). Beyond that, it's less clear than the original "one blog" problem. Instead of just having posts, all of which belong to the one blog in question, you now have to worry about which blog a post belongs to, which is not as parsimonious as one (or at least I) would ideally like.

If you're writing this from scratch, without using a framework, there are a lot of ways to deal with this (including the one big table approach), and there are trade-offs in each case (at least in each case worth considering). If you're willing to give up portability across databases you could, for instance, use a namespace mechanism, like Postgres's schemas, creating a new schema for each instance of a blog. This has the virtue of letting you embed the instances in an overarching schema that handles information that is common across all blogs- for instance commenters might be registered with one name across all of the blogs. You could create a new set of tables for each registered blog, and duplicate tables by munging some sort of prefix onto each table name- that would be really ugly though. You could go really far and virtualise each instance of the app, though that would create a headache dealing with data that crossed instances.

So I guess what I'm asking is whether or not Rails has any canonical way of dealing with this beyond the one big table approach. Blogger is just an example- I'm more interested in the general answer than in specific details about how a blogging site should/would work in Rails, as I have no intention of trying to write a new Blogger. I'm pretty new to Rails, and I figured that if I kept digging I'd find the answer myself, but since it came up on this list I figured I'd get the question into this thread.

Thanks

This sort of thing is not now, nor is likely to be in the future, a problem, providing you're using a reasonably capable relational database management system. Let's say anything from MySQL up - I haven't tried using SQLite for anything substantial so I can't vouch for its capabilities in the medium-large DB arena (I'd forgive it if it couldn't cope, though!)

Why not a problem? Indexes. And beyond that, partitioning can also be pretty useful, in DBMS' that provide it. You don't need to pin your entire table in RAM, perish the thought. You would hope to be able to store a fair chunk of the index, though, with enough left over for data page caching, so that the areas being referenced frequently are served up without recourse to disk. And you get most of that for free. At least you do with Oracle, SQL Server and Sybase, to name the three with which I have most familiarity.

The thing is, if you're facing the problem of an imminent billion-row table then you should be in the position of being able to hire in some database consultancy, if you don't already have a full-time DBA. Or team of DBAs.

So consider this as another vote for the don't-worry-be-happy party.

Regards,

Mike

This sort of thing is, presently, a huge problem if you have to deal with real traffic. At my last job I was responsible for apps that averaged more than 5 million visitors/day, with both daily and seasonal peaks. "Indexes" were not a silver bullet in that case- we had to do even simple things in complicated ways to make things scale. Many webapps are actually pretty simple- making them scale to millions of users/day is the hardest bit, in my experience. I appreciate that you took the time to answer, but I'm not looking for advice about what a db will or will not handle. I have some experience with that, including the experience of having a db completely fall over because my traffic went from 1,000 visitors/day to >5 million/day in less than two hours.

I'm just looking for an answer to a simple question. Is there a standard way, in Rails, to deal with many instances of one application? Other than chucking everything into one big table? If no, fine. If yes, a pointer to the right docs would be great. I know it's tempting to respond to this sort of request with an aphorism, but I'd like to ask that you just assume that I know what I'm doing, db-wise, and that I have a good reason for asking the question.

Thanks