how to deal with gender

I like Peter Hickman's answer the best, especially the seperating midname by spaces - doh why didn't I think of that....really good to bounce this around about, although I'll probably not bother with the suffixes. Thanks chaps.

Yours sincerely,

bb aka Sir William Bobby Seadorf Lawrence Elliott Fred-Smith the 2nd

Peter Hickman wrote:

To be honest I would go with

user_id :integer title :string forename :string midname :string surname :string suffixes :string gender :string

and treat midname as a space separated list of names (although this wont handle names like 'st clair' correctly).

Storing multiple values in a single column break First Normal Form (1NF) and is always a bad idea IMHO. If fact having multiple fields for the parts of names also technically breaks 1NF because it is a repeating group. name1, name2, name3, etc. Naming them as above actually obscures this fact.

For a fully robust solution we need to eliminate the repeating group:

Baby id :integer gender :string

Methods: def full_name    @components = NameComponent.order(:position)    @components.join(' ') end

def name_component_at_index(index)   NameComponent.where(:position => index) end

def append_name_component   ...   ... end

def insert_name_component_at_index(index)   ...   ... end

NameComponent id :integer baby_id :integer position :integer name :string

Although I showed some example convenience methods above, one could use something like acts_as_list (or of the derivatives of that) for managing the list of name components.

This is a great example of a case where normalization is doing it wrong. Unless you're somehow expecting additional genders to be added, this sort of a solution is overkill and pretty much only nets you additional database overhead. Using an attribute should suffice...

--Matt Jones

If it is a “baby name chooser” application then each “name” needs to know whether the name can be “male”, “female” or “both.” I would create male? and female? as separate fields that can be either true or false. This simplistic approach to design is superior to creating a both option in one field.

Peter Hickman wrote:

and treat midname as a space separated list of names (although this wont handle names like 'st clair' correctly).

Comma separated, then?

I don't know if it's annoying anyone else, but I've been following this thread with interest, but every time I read "gender" it grates. The word you're after is "sex". "Male" and "female" are sexes. "Masculine", "feminine", "neutral", etc, are genders...

I appreciate that it seems to have become a convention to use the word "gender" as a "polite" reference to the discrimination between innies and outies... but for the vast, overwhelming majority there are only two choices; either being male or female, and that's your sex (whether it changes over the duration of your life is a different debate :slight_smile:

Now, I'd store that in a text field in the record as either the full word, or as "m" or "f", and load the DB and the model up with constraints and validations, and not worry about having a Sex model, with associated "sexes" table. Move along to other areas of your application.

Regarding the multiple names issue; in response to the suggestion of having multiple name midname fields, I'd ask "why?" What's the ultimate purpose of storing those names? Generally it's to just be able to pull them all out of the DB to put on a letterhead when needed, and in that event, you're optimising too far to give them each a field.

For, again, the vast majority of cases, the following will suffice: first_name middle_names last_name (although that assumes the WASP model, which the majority of the world has been forced to adjust it's naming models to, arguably wrongly as that may have been)

If you want to track whether the name "Michael Pavling" should be addressed formally as "Mr Pavling" or as "Michael-san", or whatever, you need to have some way of tracking for each person whether the first_name or the last_name is the "family" name, or I'd consider a "salutation" field which stored that value (even though it duplicates values, breaking normalisation a little).

Michael Pavling wrote:

What's the best way of to deal with gender.

i.e. a Person is either Male or Female (generally and fot my purposes I'll stick with that rule).

I don't know if it's annoying anyone else, but I've been following this thread with interest, but every time I read "gender" it grates. The word you're after is "sex". "Male" and "female" are sexes. "Masculine", "feminine", "neutral", etc, are genders...

I appreciate that it seems to have become a convention to use the word "gender" as a "polite" reference to the discrimination between innies and outies...

Not exactly. Sex is biological, gender is psychosocial. There are people of male sex and feminine gender, and vice versa.

but for the vast, overwhelming majority there are only two choices; either being male or female, and that's your sex (whether it changes over the duration of your life is a different debate :slight_smile:

No. Your sex doesn't change over the duration of your life if you're a human (at least, not without surgery, but that's a different debate). Your gender may. I know genderqueer people who don't consistently identify with one gender or another, despite whatever genitalia God gave them.

Now, I'd store that in a text field in the record as either the full word, or as "m" or "f", and load the DB and the model up with constraints and validations, and not worry about having a Sex model, with associated "sexes" table. Move along to other areas of your application.

Agreed if we're talking about sexes. For genders, there are more than two categories, and the list might change as other gender categories are recognized (yes, the issue is more complex than you'd think), so I'd probably use a model or at least an enumeration for that.

Best,

I can't help but think of the commercial (Visa I think) playing during the world cup where the child had all the names of players on a winning squad from an earlier world cup....

Deignating a particular cardinality just looks kinda wrong to me, tho of course everyone else's mileage is pretty much guaranteed to vary.

How about just midnames? Then you won't run into problems with typical medieval/Renaissance Spanish names (where they tacked on long lineages), or that guy whose name starts with 128 repetitions of "Shri". :slight_smile: Or possibly "midname" and "more_midnames"? Or you could go the other way and separate the name only into family_name and given_names. Question is, is it ever important to have a given specific one of the (given) names?

-Dave

interestingly enough - that's exactly what I've done.

gone with...

forename:string midnames:string surname:string

midnames seperated by spaces basically, in the vast majority of cases I'd expect 0, 1 or more rarely 2 midnames; pragmatically it just seemed like the right thing to do in this case.

bingo bob wrote:

interestingly enough - that's exactly what I've done.

gone with...

forename:string midnames:string surname:string

midnames seperated by spaces basically, in the vast majority of cases I'd expect 0, 1 or more rarely 2 midnames; pragmatically it just seemed like the right thing to do in this case.

What you're doing is most likely going to "work" and will be easy to implement, but your solution is still bad database design. You are relying on a repeating group, multiple values in a single column.

It's fine what you're doing, as long as you realize that it is bad design and you can live with that.

For more details see:

I had considered suggesting going with a complete separate table of names, containing a person_id (or in this use-case, fullname_id), sequence number, and string, but that seemed like overkill. It would be "normal", but IMHO HAGNI. :slight_smile:

-Dave

Dave Aronson wrote:

For more details see: First normal form - Wikipedia

I had considered suggesting going with a complete separate table of names, containing a person_id (or in this use-case, fullname_id), sequence number, and string, but that seemed like overkill. It would be "normal", but IMHO HAGNI. :slight_smile:

I definitely agree that there are cases where breaking normalization for other gains is desirable. I just don't completely agree that this is one of those cases. In this case following out to 3NF is beneficial. It would provide for a more flexible and more elegant design. It will also simplify the design for asking more "interesting" questions like, "Give me a list of all names that have Dale as a middle name."

Maybe questions like that aren't a concern now, but breaking normal forms in this case will make questions like that more difficult (and complex) to ask later if the need arises. Why "build in" known limitations when there is no known reasons for do so?

I understand the desire for simple design, but not at the cost of good design. Not when there's no clear reason to avoid the better design.

Robert Walker wrote:

I definitely agree that there are cases where breaking normalization for other gains is desirable. I just don't completely agree that this is one of those cases. In this case following out to 3NF is beneficial. It would provide for a more flexible and more elegant design. It will also simplify the design for asking more "interesting" questions like, "Give me a list of all names that have Dale as a middle name."

Maybe questions like that aren't a concern now, but breaking normal forms in this case will make questions like that more difficult (and complex) to ask later if the need arises. Why "build in" known limitations when there is no known reasons for do so?

I understand the desire for simple design, but not at the cost of good design. Not when there's no clear reason to avoid the better design.

I tend to normalise as often as possible, and I do agree with you Robert. But http://c2.com/cgi/wiki?YouArentGonnaNeedIt also raises a very good point.

Pale Horse wrote:

I tend to normalise as often as possible, and I do agree with you Robert. But http://c2.com/cgi/wiki?YouArentGonnaNeedIt also raises a very good point.

Given that the primary reason for flattening data, that could otherwise be normalized, is for performance optimization reasons, http://c2.com/cgi/wiki?PrematureOptimization is the balance to YAGNI.

Robert Walker wrote:

Pale Horse wrote:

I tend to normalise as often as possible, and I do agree with you Robert. But http://c2.com/cgi/wiki?YouArentGonnaNeedIt also raises a very good point.

Given that the primary reason for flattening data, that could otherwise be normalized, is for performance optimization reasons, http://c2.com/cgi/wiki?PrematureOptimization is the balance to YAGNI.

Tautological, is this matter. Normalising the database *is* indeed, almost always the correct methodology.