- Make sure your database character set is utf8
- Make sure all your tables have a character set of utf8
- Make sure your database.yml has 'encoding: utf8' set for each database
None of these steps are required officially unless you use utf-8
specific features of the database (collation). The last setting seems
to set the connection encoding, which shouldn't be required unless
there is non-utf8 data stored in the database.
Not true! Collation and character set are separate things.
There are a couple of obvious reasons you want your database character set to be UTF8 if you're storing UTF8 strings:
1. When you access the database through the mysql (or pgsql, or other) command line, or through tools such as CocoaMySQL, you want strings to display properly.
2. MySQL never treats strings as binary; they always have a character set, which is latin1 (CP1252) by default. Putting UTF8 data into fields marked as latin1 seems like asking for trouble. (There are some byte values that are invalid in CP1252, so technically strings containing those bytes are illegal. It's only through MySQL's laziness in not checking the strings when the connection and table character sets match up that you can get away with this at all.)
There are even worse potential pitfalls here too. On one of our projects, we did everything except set the the connection encoding. What happened was that a UTF8 string in Rails would be regarded as CP1252 by MySQL, but MySQL knew that the tables needed UTF8, so it did a CP1252 to UTF8 conversion on the (already UTF8) string before writing it. As you can imagine, we ended up with all sorts of crap in the database, and the occasional string got completely munged as invalid CP1252 bytes were replaced with question marks.
These three things should at least be reduced to a single setting to avoid mistakes. I can't imagine a situation in which you would want to do one of them without the others.
- Put $KCODE='u' in your environment.rb
This is only required if you use unicode strings in your Ruby code.
If your app handles UTF8, then you're going to want to write tests involving UTF8 strings, so you're going to need this turned on. You do write UTF8 tests for your apps, right? ![:slight_smile: :slight_smile:](https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=12)
- Add an after_filter to application.rb to set the Content-Type
header correctly
Rails now defaults to utf-8 Content-Type.
Good to know. I'll take this as an endorsement of the idea the UTF8 should be the default for Rails apps. ![:slight_smile: :slight_smile:](https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=12)
Cheers,
Pete Yandell