Database Design

Hello,

I have a user table that stores their username, email address, and password. (along with salt, etc)

However, now I want to have more complex information associated with each user (address, etc). I'm no DBA expert, so in terms of performance, which is better:

- A user table that has all the fields in it (Username, Email, password, country, state, zipcode, etc) - A user table that only has username/password/email, that is connected via a foreign key to another table that stores all the other fields.

Which is better? And why?

Thank you

-David Zhu

David Zhu wrote in post #971909:

Hello,

I have a user table that stores their username, email address, and password. (along with salt, etc)

However, now I want to have more complex information associated with each user (address, etc). I'm no DBA expert, so in terms of performance, which is better:

- A user table that has all the fields in it (Username, Email, password, country, state, zipcode, etc) - A user table that only has username/password/email, that is connected via a foreign key to another table that stores all the other fields.

Which is better? And why?

The first will be slightly better in terms of performance, because you're not joining tables. But if your tables are properly indexed, the join should have a negligible effect on performance.

However, I don't see why you'd use the second approach. It's more complex for no particular gain.

Thank you

-David Zhu

Best,

I totally agree with Marnen's post above. The first approach is going to be better in terms of performance than storing it in another table.

The reasons are, basically: 1) Joins on the two tables during queries 2) Index size

So let's say you have "users" and "addresses" and each address has a user_id FK. Cool. Totally valid approach, but again agreeing with Marnen, I can't see a reason why you'd need to store this in another table if each user is only going to have ONE data set for their address information. As Marnen points out, the extra time it would take to perform this query is negligible if your tables are properly indexed.

However, those indexes could cause some additional overhead, both performance wise and financially, down the road. Each of those indexes will grow in size relative to the number of records in each table. So if you have only 5 entries, you won't notice a difference. 5 million though? Now we're talking!

So, while you could certainly do it by including a second address table, it wouldn't necessarily provide a performance benefit, and if you have very large data sets - either now or in the future - could actually be detrimental to performance to some degree. Compensating for that may involve paying more money to your host for more disk space for your DB, possibly for upgraded memory limits as well, etc. But that really depends on the size/scope of your app.

The other consideration that Marnen also alluded to is that you may not necessarily gain an advantage in design by storing address information in a secondary table. My general rule of thumb is: if it's going to have multiple "data sets" (or "records") per user record (in this case), it needs its own table. But if the system's design says "a user has one, and only one, address", then there's no reason to store addresses in a secondary table. They can simply be factored into the first table.

I hope this helps you a bit :slight_smile: Good luck!

The one time you might want to keep addresses, for example, in a separate table, would be if your app had the concept of multiple users at the same address. You might have family membership of a club for example. In this case it would be a bad idea to save the same address a number of times for each member of the family.

Colin

David Zhu wrote in post #971909:

Which is better? And why?

David: I certainly defer to the wise words of Marnen, Phoenix and Colin.

One case where you might want to split one logical entity into two tables is when you have a "transaction" part (frequently changing) and a "static" part (rarely changing). To state the obvious, if you wanted to capture login times and IP addresses for your users, that would certainly be a table separate from the rest of the user information.

If you're keen to learn more philosophy about database design, try a google of "Dimensional Database" and/or "Ralph Kimball".

But beyond that, yeah, one table should suffice for your needs.

- ff

?! `User has_many addresses` is a pretty standard scenario: work and home, billing vs. shipping, and so on, in which case of course it makes sense to have a separate "address" table.

Even with the concept of *one* (home) address per user, I've had a use case that included tracking length of residence, so again, that's definitely a separate table.

Any such design question requires context...

FWIW,

I have a user table that stores their username, email address, and password. (along with salt, etc)

However, now I want to have more complex information associated with each user (address, etc). I'm no DBA expert, so in terms of performance, which is better:

- A user table that has all the fields in it (Username, Email, password, country, state, zipcode, etc) - A user table that only has username/password/email, that is connected via a foreign key to another table that stores all the other fields.

Which is better? And why?

I tend to agree with what everyone else said... in addition to that...

You really *REALLY* need to make sure that the business case is "one address per user". Sure, you're client will tell you that and you'll build it, and next week they'll realize they just assumed that "address" mean "home" and "work" and "vacation house", etc. And now you're gonna rebuild it. The longer it takes for that realization to happen the more you'll have to redo.

Now, maybe that will never happen, and I wouldn't suggest splitting the tables on the 0.0001% chance it might happen, but it's worth taking some time now to consider that possibility.

This sort of thing happens all time time...

a user belongs to a group -> a user can belong to multiple groups a user has a phone number -> a user has lots of phone numbers etc...

The other thing to check, which probably isn't an issue anymore, but it at some point in the past (years maybe, it's all fuzzy :slight_smile: ... some databases would pad out the "record" to accommodate the maximum length for that record. Mostly this came down to issues around picking CHAR vs VARCHAR... So instead of:

Philip>Hallstrom>philip@pjkh.com

You'd get:

Philip |Hallstrom |philip@pjkh.com

So... take your example... if it's padded, and *most* users won't have an address at all, you may potentially use up a lot of space for nothing.

The advantage of doing this is that the database knows exactly where the "last name" field begins in each record so it can find it faster than checking it's internals and seeing where the field starts for each record, and it's easy to update a record "in place" instead of appending it to the end and invalidating the existing row (think of it like fragmentation).

I probably should erase everything I just said about CHAR/VARCHAR as I don't think it's worth considering, but you did ask and it's interesting (to me anyway :slight_smile:

-philip