This is not a question but a report on the difficulties I had and the solution I found with respect to UTF-8, YAML::load, and Ruby/Rails.
Comments are appreciated.
- - -
I had been struggling for two days to get UTF-8 working in my Rails app.
I had/have a localization file, lib\locale\de.yml, that had iso-8859-1 encoding. I could not get that to display properly.
Marnen, quite correctly, suggested that I transit to UTF-8. Of course, I had tried to do that but I could not get the YAML localization file to load.
What I had done was load the ANSI (i.e. iso-8859-1) localization file into Notepad, convert to UTF-8, and saved that file.
Then all my German (de.yml) localizations failed.
It turns out that Notepad places "\xEF\xBB\xBF" at the beginning of the file to indicate that this is a YAML file.
These three bytes appear to screw up YAML::load
Gimme a break!
Note only does Notepad put in these indicator bytes ... so does TextMate.
In fact, TextMate will happily determine that your non-"\xEF\xBB\xBF" file is a UTF-8 file and will automatically reinsert the indicator bytes. I find this rather hysterical (not in a good way) since in Handling encodings (UTF-8) one of the authors of TextMate wrote "Property 3 turns out to be attractive because it means we can heuristically recognize UTF-8 with a near 100% certainty by checking if the file is valid. Some software think it’s a good idea to embed a BOM (byte order mark) in the beginning of an UTF-8 file, but it is not, because the file can already be recognized, and placing a BOM in the beginning of a file means placing three bytes in the beginning of the file which a program that use the file may not expect...".
How thoughtful that TextMate does what the article says it should not do. If there is a way to turn off that behavior, I can't find it. Maybe there's a TextMate bundle ... who knows?
In order to get YAML::Load to load the localization, I have to remove the three indicator bytes. Yuck!
Once I did that, YAML loads happily.
- - - - - - - - -
If you store your locales in lib/locale and you use the AVAILABLE_LOCALES idiom as suggested in http://rails-i18n.org/wiki/pages/i18n-available_locales then you can use this in config\initializers\available_locales.rb
- - -
#See Rails Internationalization (I18n) API — Ruby on Rails Guides
# # Get loaded locales conveniently # See http://rails-i18n.org/wiki/pages/i18n-available_locales module I18n class << self def available_locales; backend.available_locales; end end
module Backend class Simple def available_locales; translations.keys.collect { |l| l.to_s }.sort; end end end end
# You need to "force-initialize" loaded locales I18n.backend.send(:init_translations)
AVAILABLE_LOCALES = I18n.backend.available_locales RAILS_DEFAULT_LOGGER.debug "* Loaded locales: #{AVAILABLE_LOCALES.inspect}"
#Shnelvar: Remove UTF-8 indicator bytes so that YAML::load works AVAILABLE_LOCALES.each do |localization_name| # localization_name is, e.g. "de" localization_name_dot_yml = localization_name + '.yml' localization_file_name = File.join('lib/locale',localization_name_dot_yml) yaml_str = IO.read(localization_file_name)
utf_8__3_byte_indicator = "\xEF\xBB\xBF" if yaml_str[0..2] == utf_8__3_byte_indicator yaml_str = yaml_str[3...yaml_str.size] File.open(localization_file_name,"w") { |f| f << yaml_str } puts localization_file_name + ' has had the UTF-8 indicator bytes removed' end end
- - -
Suggestions and comments are welcome.