Database Design

Hello,

I have a user table that stores their username, email address, and
password. (along with salt, etc)

However, now I want to have more complex information associated with
each user (address, etc). I'm no DBA expert, so in terms of
performance, which is better:

- A user table that has all the fields in it (Username, Email,
password, country, state, zipcode, etc)
- A user table that only has username/password/email, that is
connected via a foreign key to another table that stores all the other
fields.

Which is better? And why?

Thank you

-David Zhu

David Zhu wrote in post #971909:

Hello,

I have a user table that stores their username, email address, and
password. (along with salt, etc)

However, now I want to have more complex information associated with
each user (address, etc). I'm no DBA expert, so in terms of
performance, which is better:

- A user table that has all the fields in it (Username, Email,
password, country, state, zipcode, etc)
- A user table that only has username/password/email, that is
connected via a foreign key to another table that stores all the other
fields.

Which is better? And why?

The first will be slightly better in terms of performance, because
you're not joining tables. But if your tables are properly indexed, the
join should have a negligible effect on performance.

However, I don't see why you'd use the second approach. It's more
complex for no particular gain.

Thank you

-David Zhu

Best,

I totally agree with Marnen's post above. The first approach is going
to be better in terms of performance than storing it in another table.

The reasons are, basically:
1) Joins on the two tables during queries
2) Index size

So let's say you have "users" and "addresses" and each address has a
user_id FK. Cool. Totally valid approach, but again agreeing with
Marnen, I can't see a reason why you'd need to store this in another
table if each user is only going to have ONE data set for their
address information. As Marnen points out, the extra time it would
take to perform this query is negligible if your tables are properly
indexed.

However, those indexes could cause some additional overhead, both
performance wise and financially, down the road. Each of those
indexes will grow in size relative to the number of records in each
table. So if you have only 5 entries, you won't notice a difference.
5 million though? Now we're talking!

So, while you could certainly do it by including a second address
table, it wouldn't necessarily provide a performance benefit, and if
you have very large data sets - either now or in the future - could
actually be detrimental to performance to some degree. Compensating
for that may involve paying more money to your host for more disk
space for your DB, possibly for upgraded memory limits as well, etc.
But that really depends on the size/scope of your app.

The other consideration that Marnen also alluded to is that you may
not necessarily gain an advantage in design by storing address
information in a secondary table. My general rule of thumb is: if
it's going to have multiple "data sets" (or "records") per user record
(in this case), it needs its own table. But if the system's design
says "a user has one, and only one, address", then there's no reason
to store addresses in a secondary table. They can simply be factored
into the first table.

I hope this helps you a bit :slight_smile: Good luck!

The one time you might want to keep addresses, for example, in a
separate table, would be if your app had the concept of multiple users
at the same address. You might have family membership of a club for
example. In this case it would be a bad idea to save the same address
a number of times for each member of the family.

Colin

David Zhu wrote in post #971909:

Which is better? And why?

David: I certainly defer to the wise words of Marnen, Phoenix and Colin.

One case where you might want to split one logical entity into two
tables is when you have a "transaction" part (frequently changing) and a
"static" part (rarely changing). To state the obvious, if you wanted to
capture login times and IP addresses for your users, that would
certainly be a table separate from the rest of the user information.

If you're keen to learn more philosophy about database design, try a
google of "Dimensional Database" and/or "Ralph Kimball".

But beyond that, yeah, one table should suffice for your needs.

- ff

?! `User has_many addresses` is a pretty standard scenario: work
and home, billing vs. shipping, and so on, in which case of course it
makes sense to have a separate "address" table.

Even with the concept of *one* (home) address per user, I've had a
use case that included tracking length of residence, so again, that's
definitely a separate table.

Any such design question requires context...

FWIW,

I have a user table that stores their username, email address, and
password. (along with salt, etc)

However, now I want to have more complex information associated with
each user (address, etc). I'm no DBA expert, so in terms of
performance, which is better:

- A user table that has all the fields in it (Username, Email,
password, country, state, zipcode, etc)
- A user table that only has username/password/email, that is
connected via a foreign key to another table that stores all the other
fields.

Which is better? And why?

I tend to agree with what everyone else said... in addition to that...

You really *REALLY* need to make sure that the business case is "one address per user". Sure, you're client will tell you that and you'll build it, and next week they'll realize they just assumed that "address" mean "home" and "work" and "vacation house", etc. And now you're gonna rebuild it. The longer it takes for that realization to happen the more you'll have to redo.

Now, maybe that will never happen, and I wouldn't suggest splitting the tables on the 0.0001% chance it might happen, but it's worth taking some time now to consider that possibility.

This sort of thing happens all time time...

a user belongs to a group -> a user can belong to multiple groups
a user has a phone number -> a user has lots of phone numbers
etc...

The other thing to check, which probably isn't an issue anymore, but it at some point in the past (years maybe, it's all fuzzy :slight_smile: ... some databases would pad out the "record" to accommodate the maximum length for that record. Mostly this came down to issues around picking CHAR vs VARCHAR... So instead of:

Philip>Hallstrom>philip@pjkh.com

You'd get:

Philip |Hallstrom |philip@pjkh.com

So... take your example... if it's padded, and *most* users won't have an address at all, you may potentially use up a lot of space for nothing.

The advantage of doing this is that the database knows exactly where the "last name" field begins in each record so it can find it faster than checking it's internals and seeing where the field starts for each record, and it's easy to update a record "in place" instead of appending it to the end and invalidating the existing row (think of it like fragmentation).

I probably should erase everything I just said about CHAR/VARCHAR as I don't think it's worth considering, but you did ask and it's interesting (to me anyway :slight_smile:

-philip