Working round 'invalid byte sequence'

I am a very amateur Rubyist who, amongst other things, likes to use a simple Rails app to query my company's MySQL config database. The server I now use to do this has got 1.9.1 and Rails 2.3.3. I've now hit the 'problems' related to 1.9 and string encoding, which means that when Rails try to display, say, E acute characters, it throws an invalid byte sequence, namely ArgumentError (invalid byte sequence in UTF-8):

Given that I only access the MySQL database over a private network and with a read-only account, is there some simple and easy way to suppress this issue? Without being an expert in this area (obviously) I guess that either I can try to "tell" Ruby to treat the MySQL data as an encoding other than UTF-8 (I guess US-ASCII but it could be trial and error to work out what), and/or I could add some rescue code to find (and ignore) bad byte sequences. I've tried to find recipes for both the above, but quickly get lost in the subtleties of it all! Any and all help appreciated. Many thanks in advance.

I'd check with the whoever admins the MySQL DB to find out what character set it's actually using. I think you can then tell the adapter to translate. Best guess is either US ASCII, or (more likely) Windows-1252 pretending to be ASCII.

--Matt Jones

Matt Jones wrote:

this issue? �Without being an expert in this area (obviously) I guess that either I can try to "tell" Ruby to treat the MySQL data as an encoding other than UTF-8 (I guess US-ASCII �but it could be trial and error to work out what), and/or I could add some rescue code to find (and ignore) bad byte sequences. �I've tried to find recipes for both the above, but quickly get lost in the subtleties of it all! �Any and all help appreciated. �Many thanks in advance.

I'd check with the whoever admins the MySQL DB to find out what character set it's actually using. I think you can then tell the adapter to translate. Best guess is either US ASCII, or (more likely) Windows-1252 pretending to be ASCII.

--Matt Jones

Many thanks for the reply Matt. I used the console to determine that the db is serving up ASCII-8BIT

e = Equipment.find(:first, :conditions => ['id = ?', 1234]) e.name.encoding

=> #<Encoding:ASCII-8BIT>

I then set the encoding in /config/database.yml to 'ascii' which although it can't display special characters, at least it shows the page with "?" in place of the accented charaters. I tried setting encoding to "ascii-8bit" and varieties of this, but each time Rails complained - so if anyone can tell me how to indicate ASCII-8BIT I'd be grateful.

Toby Rodwell wrote: [...]

Many thanks for the reply Matt. I used the console to determine that the db is serving up ASCII-8BIT

e = Equipment.find(:first, :conditions => ['id = ?', 1234]) e.name.encoding

=> #<Encoding:ASCII-8BIT>

I then set the encoding in /config/database.yml to 'ascii' which although it can't display special characters, at least it shows the page with "?" in place of the accented charaters. I tried setting encoding to "ascii-8bit" and varieties of this, but each time Rails complained - so if anyone can tell me how to indicate ASCII-8BIT I'd be grateful.

This doesn't solve your immediate problem, but...if your host locks the DB in ASCII 8-bit and you can't change it, then find a new host. That encoding is inappropriate for real work. :slight_smile:

Best,

Toby Rodwell wrote:

[...]

> Many thanks for the reply Matt. I used the console to determine that > the db is serving up ASCII-8BIT

>>>e = Equipment.find(:first, :conditions => ['id = ?', 1234]) >>> e.name.encoding > => #<Encoding:ASCII-8BIT>

> I then set the encoding in /config/database.yml to 'ascii' which > although it can't display special characters, at least it shows the page > with "?" in place of the accented charaters. I tried setting encoding > to "ascii-8bit" and varieties of this, but each time Rails complained - > so if anyone can tell me how to indicate ASCII-8BIT I'd be grateful.

This doesn't solve your immediate problem, but...if your host locks the DB in ASCII 8-bit and you can't change it, then find a new host. That encoding is inappropriate for real work. :slight_smile:

This is not about the database itself this is to do with the interaction between the mysql driver and the new string encoding schemes - strings in ruby 1.9 are encoding aware and from what I gather the mysql driver creates strings with the ascii-8bit encoding regardless of their actual encoding (my very vague understanding is that ascii-8bit is sort of pseudo encoding that doesn't actually mean ascii - it just means raw bytes)

There is quite a lot of discussion on lighthouse here: https://rails.lighthouseapp.com/projects/8994/tickets/2476-ascii-8bit-encoding-of-query-results-in-rails-232-and-ruby-191#ticket-2476-2 although no clear resolution that I could see. May provide some help to Toby. One way out would be to fall back to ruby 1.8.x, where these problems do not exist because strings are just dumb collections of bytes.

Fred