Multibyte Character References

Michael_Wang · September 8, 2007, 1:06am

Mark Dodwell wrote:

Hi,

I have a load of records in my database which were imported through processing a YAML file. These original YAML files were created from the 'to_yaml' function of an array of Hash objects.

The YAML file contains multibyte character references such as:

...and between them and today\xE2\x80\x99s College. The scope, r...

When I imported this data into my DB these character references have changed but are still there in the DB:

...and between them and today\342\200\231s College. The scope, r...

So I have two questions:

1) Are the original characters retreivable from the copy in the DB, or has it been mangled?

2) If the above answer is yes, then how!

Really appreciate any help on this one. Many thanks in advance.

~ Mark

What's the encoding in the YAML file (presumably UTF-8), what database are you using and what encoding is your database/table set to?

Michael_Wang · September 8, 2007, 1:22am

Mark Dodwell wrote:

Hi Michael,

The DB is 'ISO Latin 1 (latin1)' encoding.

I'm not sure about the original YAML file (do you know the default encoding for .to_yaml?) - but when I open it directly with, say TextMate, it shows the character reference *not* the actual character.

Thanks,

~ Mark

MySQL, if that's what you are using, let's you set the character encoding at various different levels (server, database, table, column). If you are using MySQL you could try something like an ALTER TABLE to change the encoding to UTF-8 (which I'm guessing is what the original YAML data is in). You might have to export the data and import it into a table that's already set to UTF-8, though, in which case if you still have all the YAML data around it might be easier just to reload that with the table set to the proper encoding.

http://dev.mysql.com/doc/refman/5.0/en/charset.html

Topic		Replies	Views
YAML, UTF-8, TextMate, Notepad rubyonrails-talk	7	136	February 16, 2010
database.yml: encoding: utf8 does not work rubyonrails-talk	1	143	May 3, 2010
database.yml: encoding: utf8 does not work rubyonrails-talk	2	398	April 3, 2007
The dreaded Unicode issue rubyonrails-talk	7	115	January 8, 2007
Issues with MySql database encoding rubyonrails-talk	3	140	June 9, 2009

Multibyte Character References

Related topics

More Resources