hi all,
platform: debian lenny, ruby1.91.p0, passenger/apache-multithread, rails2.3 in vendor/postres and sql server via odbc. all current gems.
i have legacy asp content on win2k servers that i wrap in rails controllers. this all worked great with ruby1.8, but now that we are dealing with encoded strings in ruby1.9, i am having page crashes randomly as users have cut and pasted high ascii code characters (e.g. ascii 150 - a fancy dash) that are ms only and non-standard.
normally, i just wouldn't have cared or even worried about it that much; however, in testing this a bit further after a few mysterious rails page crashes, i did more experimenting. i found that if i put the following in my asp page, it will cause the rails page to fail with "invalid byte sequence in utf-8" ror/vendor/rails/activesupport/ lib/active_support/core_ext/blank.rb: 50
the offending asp code is:
<%= chr(150) %> this is my own doing to reproduce the issue, but there are many non- standard windows characters that are not utf-8 compliant that probably riddle my sql server database because users like to cut and paste content from word and other places.
it turns out that because the content that i bring in via ruby net::http has non-utf8 characters, the encoding is set to ascii8bit and when i do force_encoding(utf-8), valid_encoding is false and the page just fails. html::sanitize isn't an option as i don't want to strip the tags. the content is from internal trusted servers that i control. i just need to sanizite, i guess, the bad characters.
my thoughts/questions: 1) seems like rails should be less brittle about managing encoding such that blank? doesn't just fail when the valid_encoding is false. or you shouldn't be able to create a string if the encoding is bad. or it should make best efforts to transliterate the bad characters. something.
2) is iconv my best option. seems kind of nuts that i have to reencode the entire html page for one character. this does work using the translit//ignore options i get my pages, but i wonder at the overhead.
3) as usual, trying to make my ms iis5 servers do anything useful is a non-starter. sure it says it can generate utf-8, but trying it the (typically confused and poorly documented) 25 different ways to make it do so, results in nothing but more wasted time. so i need a good rails solution that "just works."
4) it occurs to me that it could also be that ruby is setting the default to acsii for net::http regardless of how iis is sending it. how do i check/set the encoding.default_external in rails. why does rails remove the Encoding class. it isn't there in console, but is in irb. i dislike rails remvoing native ruby classes.
please. i am so close to having ruby1.9/rails2.3 working, but this encoding stuff is really a hassle.