UTF-8 as default database encoding

As you might know, the db:create rake task creates utf8 databases for
MySQL, Postgres and SQLite. http://dev.rubyonrails.org/ticket/8448

The only problem is that you need to manually add the encoding to your
database config file.

Because I'm lazy and I believe most people are, I submitted a patch to
add the utf-8/unicode encoding to the auto generated database.yml
file. A lot of people still ignore that you can define the encoding to
use and I think that this patch should help.

http://dev.rubyonrails.org/ticket/8701

What do you think?

Matt Aimonetti

It's not just about being lazy, it's first and foremost about Doing the Right Thing™. When the encoding of the database is UTF-8, it's not strange to assume that the driver should also be in UTF-8 mode. Not many people know why and that you have to set the encoding in the database driver.

Manfred

AIUI, the server encoding and the client encoding (here defined by the driver), don't necessarily have to match. In cases where the client encoding doesn't match the server encoding, the server translates from the client encoding to the server encoding. For related PostgreSQL documentation, see

http://www.postgresql.org/docs/8.2/interactive/multibyte.html

I'd be surprised if other database servers don't have similar functionality.

I think a more likely problem is when the HTML charset and client encoding don't match.

Of course, if the default is UTF-8 for the HTML, it makes sense to use UTF-8 in the database client (driver) as well.

Michael Glaesemann
grzm seespotcode net

Manfred, 'lazy' was indeed not the best word to describe the reason
behind this patch but thanks for your support.

Here is an example of what could happen if you use utf-8 data without
the proper encoding set in your driver.

This is what you should see, a drop down with 3 languages named in
their own language.
http://www.railsontherun.com/assets/2007/6/22/utf8db-encoding-html_thumb.png

This is what you get if you don't set the encoding to utf-8 in your
database.yml file:
http://www.railsontherun.com/assets/2007/6/22/utf8-db_no_encoding_utf8_html_thumb.png

And just for fun, here is the same list with the html charset set as
iso8859-1
http://www.railsontherun.com/assets/2007/6/22/utf8db-no_encoding-iso8859-1_thumb.png

The above screenshots were taken using mysql. When using Postgresql
without defining the encoding for the db driver, the data is displayed
properly (in utf-8 as shown in screenshot #1).

I added he encoding setting to the postgresql template because Manfred
asked for it and also because I think it's good to be consistent as
much as possible.

-Matt

Patch merged-in

http://dev.rubyonrails.org/changeset/7116

Thanks to bitsweat :slight_smile: