Working round 'invalid byte sequence'

I am a very amateur Rubyist who, amongst other things, likes to use a
simple Rails app to query my company's MySQL config database. The
server I now use to do this has got 1.9.1 and Rails 2.3.3. I've now hit
the 'problems' related to 1.9 and string encoding, which means that when
Rails try to display, say, E acute characters, it throws an invalid byte
sequence, namely
ArgumentError (invalid byte sequence in UTF-8):

Given that I only access the MySQL database over a private network and
with a read-only account, is there some simple and easy way to suppress
this issue? Without being an expert in this area (obviously) I guess
that either I can try to "tell" Ruby to treat the MySQL data as an
encoding other than UTF-8 (I guess US-ASCII but it could be trial and
error to work out what), and/or I could add some rescue code to find
(and ignore) bad byte sequences. I've tried to find recipes for both
the above, but quickly get lost in the subtleties of it all! Any and
all help appreciated. Many thanks in advance.

I'd check with the whoever admins the MySQL DB to find out what
character set it's actually using. I think you can then tell the
adapter to translate. Best guess is either US ASCII, or (more likely)
Windows-1252 pretending to be ASCII.

--Matt Jones

Matt Jones wrote:

this issue? �Without being an expert in this area (obviously) I guess
that either I can try to "tell" Ruby to treat the MySQL data as an
encoding other than UTF-8 (I guess US-ASCII �but it could be trial and
error to work out what), and/or I could add some rescue code to find
(and ignore) bad byte sequences. �I've tried to find recipes for both
the above, but quickly get lost in the subtleties of it all! �Any and
all help appreciated. �Many thanks in advance.

I'd check with the whoever admins the MySQL DB to find out what
character set it's actually using. I think you can then tell the
adapter to translate. Best guess is either US ASCII, or (more likely)
Windows-1252 pretending to be ASCII.

--Matt Jones

Many thanks for the reply Matt. I used the console to determine that
the db is serving up ASCII-8BIT

e = Equipment.find(:first, :conditions => ['id = ?', 1234])
e.name.encoding

=> #<Encoding:ASCII-8BIT>

I then set the encoding in /config/database.yml to 'ascii' which
although it can't display special characters, at least it shows the page
with "?" in place of the accented charaters. I tried setting encoding
to "ascii-8bit" and varieties of this, but each time Rails complained -
so if anyone can tell me how to indicate ASCII-8BIT I'd be grateful.

Toby Rodwell wrote:
[...]

Many thanks for the reply Matt. I used the console to determine that
the db is serving up ASCII-8BIT

e = Equipment.find(:first, :conditions => ['id = ?', 1234])
e.name.encoding

=> #<Encoding:ASCII-8BIT>

I then set the encoding in /config/database.yml to 'ascii' which
although it can't display special characters, at least it shows the page
with "?" in place of the accented charaters. I tried setting encoding
to "ascii-8bit" and varieties of this, but each time Rails complained -
so if anyone can tell me how to indicate ASCII-8BIT I'd be grateful.

This doesn't solve your immediate problem, but...if your host locks the
DB in ASCII 8-bit and you can't change it, then find a new host. That
encoding is inappropriate for real work. :slight_smile:

Best,

Toby Rodwell wrote:

[...]

> Many thanks for the reply Matt. I used the console to determine that
> the db is serving up ASCII-8BIT

>>>e = Equipment.find(:first, :conditions => ['id = ?', 1234])
>>> e.name.encoding
> => #<Encoding:ASCII-8BIT>

> I then set the encoding in /config/database.yml to 'ascii' which
> although it can't display special characters, at least it shows the page
> with "?" in place of the accented charaters. I tried setting encoding
> to "ascii-8bit" and varieties of this, but each time Rails complained -
> so if anyone can tell me how to indicate ASCII-8BIT I'd be grateful.

This doesn't solve your immediate problem, but...if your host locks the
DB in ASCII 8-bit and you can't change it, then find a new host. That
encoding is inappropriate for real work. :slight_smile:

This is not about the database itself this is to do with the
interaction between the mysql driver and the new string encoding
schemes - strings in ruby 1.9 are encoding aware and from what I
gather the mysql driver creates strings with the ascii-8bit encoding
regardless of their actual encoding (my very vague understanding is
that ascii-8bit is sort of pseudo encoding that doesn't actually mean
ascii - it just means raw bytes)

There is quite a lot of discussion on lighthouse here:
https://rails.lighthouseapp.com/projects/8994/tickets/2476-ascii-8bit-encoding-of-query-results-in-rails-232-and-ruby-191#ticket-2476-2
although no clear resolution that I could see. May provide some help
to Toby. One way out would be to fall back to ruby 1.8.x, where these
problems do not exist because strings are just dumb collections of
bytes.

Fred