Different Iconv behavior with the same Ruby version

11155 · January 23, 2012, 5:10pm

Hi all,

This problem is making me nuts. I am using Iconv.conv to convert from UTF-8 to ISO-8859-1:

Iconv.conv('iso-8859-1//IGNORE', 'utf-8', @data).html_safe

Both locally and on production the Ruby version is 1.9.3p0 (Rails 3.0.3), but it raises the following exception only on production:

A Iconv::IllegalSequence occurred in newsletters#show:

"e acompanham, na"... app/controllers/newsletters_controller.rb:19:in `conv'

If I delete that part of the text, it raises again in other location. This is really strange because the contents locally and on production are exactly the same. Here is the text I am trying to convert (user created data): https://gist.github.com/1664294. Any ideas?

Thanks!

Henrique

Peter_Vandenabeele1 · January 23, 2012, 5:22pm

Hi all,

This problem is making me nuts. I am using Iconv.conv to convert from

UTF-8 to ISO-8859-1:

Iconv.conv(‘iso-8859-1//IGNORE’, ‘utf-8’, @data).html_safe

Both locally and on production the Ruby version is 1.9.3p0 (Rails

3.0.3), but it raises the following exception only on production:

A Iconv::IllegalSequence occurred in newsletters#show:

“e acompanham, na”…

app/controllers/newsletters_controller.rb:19:in `conv’

If I delete that part of the text, it raises again in other location.

This is really strange because the contents locally and on production

are exactly the same. Here is the text I am trying to convert (user

created data): https://gist.github.com/1664294. Any ideas?

Thanks!

Henrique

FWIW, I was able to reproduce the exception

Iconv::IllegalSequence

with a simple ruby program (rvm ruby 1.9.3).

$ wget https://raw.github.com/gist/1664294/17c4e28a1bf87b331c0425e9ddbb48284d096b00/gistfile1.txt

–2012-01-23 18:16:02-- https://raw.github.com/gist/1664294/17c4e28a1bf87b331c0425e9ddbb48284d096b00/gistfile1.txt

Resolving raw.github.com… 207.97.227.243 Connecting to raw.github.com|207.97.227.243|:443… connected. HTTP request sent, awaiting response… 200 OK Length: 50089 (49K) [text/plain]

Saving to: `gistfile1.txt’

100%[======================================>] 50,089 --.-K/s in 0.08s

2012-01-23 18:16:03 (584 KB/s) - `gistfile1.txt’ saved [50089/50089]

$ cat convert.rb

@data File.open(‘gistfile1.txt’) do |f| @data = f.read end

require ‘iconv’ Iconv.conv(‘iso-8859-1//IGNORE’, ‘utf-8’, @data).html_safe

$ ruby convert.rb /home/peterv/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require’: iconv will be deprecated in the future, use String#encode instead.

convert.rb:7:in conv': " style=\"padding-"... (Iconv::IllegalSequence) from convert.rb:7:in ’

I will do a little bit of research more,

Peter

Peter_Vandenabeele1 · January 23, 2012, 5:54pm

Some relevant links:

http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/

http://blog.grayproductions.net/articles/ruby_19s_string http://www.ruby-doc.org/core-1.9.3/Encoding/Converter.html#method-i-convert

The code that seems to function fairly well is:

$ cat convert.rb File.open(‘gistfile1.txt’) do |f| f.readlines.each do |line| puts “###############################################”

puts line.valid_encoding? # always true

ec = Encoding::Converter.new("utf-8", "ISO-8859-1", :undef => :replace)
ec.replacement = "UNDEFINED"
puts ec.convert(line)

end end

$ ruby convert.rb > result

This code converts your entire document (line by line) without throwing exceptions.

The source text seems to be always valid UTF-8.

But … some UTF-8 constructs seem to be incompatible to translate

to ISO-8859-1, e.g. the long dash in this piece of text:

“… institucional do Grupo Zaffari – aliás …”

It is found back in the output with the code “UNDEFINED” that I defined.

Without the :undef, that produced:

convert.rb:9:in `convert’: U+2013 from UTF-8 to ISO-8859-1 (Encoding::UndefinedConversionError)

That seems quite plausible since UTF-8 has many different code points,

but ISO-8859-1 is limited to 1 byte if I understand correctly.

I hope this can put you on the right track,

Peter

11155 · January 24, 2012, 12:14am

Thank you very much Peter! I used your code and replaced these UTF-8 only chars for similars in ISO-8859-1 (tryed the transliterate mehod but it seems it doesn't work for special chars).

Thanks again,

Henrique

Topic		Replies	Views
Reliable character encodings conversion rubyonrails-talk	0	78	September 30, 2008
Iconv translit working in every situation… but the running app! rubyonrails-talk	1	118	May 15, 2008
iconv does not work with french characters rubyonrails-talk	0	149	March 29, 2007
Encoding rubyonrails-talk	7	159	June 24, 2011
STRANGE CHARACTERS IN RAILS rubyonrails-talk	0	106	November 28, 2008

Different Iconv behavior with the same Ruby version

Related topics

More Resources