platform: debian lenny, ruby1.91.p0, passenger/apache-multithread,
rails2.3 in vendor/postres and sql server via odbc. all current gems.
i have legacy asp content on win2k servers that i wrap in rails
controllers. this all worked great with ruby1.8, but now that we are
dealing with encoded strings in ruby1.9, i am having page crashes
randomly as users have cut and pasted high ascii code characters (e.g.
ascii 150 - a fancy dash) that are ms only and non-standard.
normally, i just wouldn't have cared or even worried about it that
much; however, in testing this a bit further after a few mysterious
rails page crashes, i did more experimenting. i found that if i put
the following in my asp page, it will cause the rails page to fail
with "invalid byte sequence in utf-8" ror/vendor/rails/activesupport/
the offending asp code is:
<%= chr(150) %>
this is my own doing to reproduce the issue, but there are many non-
standard windows characters that are not utf-8 compliant that probably
riddle my sql server database because users like to cut and paste
content from word and other places.
it turns out that because the content that i bring in via ruby
net::http has non-utf8 characters, the encoding is set to ascii8bit
and when i do force_encoding(utf-8), valid_encoding is false and the
page just fails. html::sanitize isn't an option as i don't want to
strip the tags. the content is from internal trusted servers that i
control. i just need to sanizite, i guess, the bad characters.
1) seems like rails should be less brittle about managing encoding
such that blank? doesn't just fail when the valid_encoding is false.
or you shouldn't be able to create a string if the encoding is bad. or
it should make best efforts to transliterate the bad characters.
2) is iconv my best option. seems kind of nuts that i have to reencode
the entire html page for one character. this does work using the
translit//ignore options i get my pages, but i wonder at the
3) as usual, trying to make my ms iis5 servers do anything useful is a
non-starter. sure it says it can generate utf-8, but trying it the
(typically confused and poorly documented) 25 different ways to make
it do so, results in nothing but more wasted time. so i need a good
rails solution that "just works."
4) it occurs to me that it could also be that ruby is setting the
default to acsii for net::http regardless of how iis is sending it.
how do i check/set the encoding.default_external in rails. why does
rails remove the Encoding class. it isn't there in console, but is in
irb. i dislike rails remvoing native ruby classes.
please. i am so close to having ruby1.9/rails2.3 working, but this
encoding stuff is really a hassle.