as an update, i added a rescue clause to blank.rb to find out what was
choking:
blank.rb:50
class String #:nodoc:
def blank?
self !~ /\S/
rescue
raise "#{self.class} #{self.encoding.name} #{self.valid_encoding?} #
{self}"
end
end
i reran my page and checked the log. what i get is that it is the html
of my entire page that is choking. it says that the encoding is utf-8
but self.valid_encoding is false. the text of the page follows.
RuntimeError (String UTF-8 false
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/
TR/html4/strict.dtd">
<html>
... the rest of my page
vendor/rails/activesupport/lib/active_support/core_ext/blank.rb line
50
class Object
# An object is blank if it's false, empty, or a whitespace string.
# For example, "", " ", +nil+, , and {} are blank.
one last update. i discovered that the problem is that i have windows
asp content from a legacy server that i use Net::HTTP to import into
my rails site. windows allows high ascii value codes (in this case
ascii 150/unicode 8211 resembles as dash, but isn't. it's a fancy
longer windows dash). anyway, someone entered data presumably on a
windows box and pasted some stuff into a text field. that text field
got saved in sql server fine. but when rails tries to parse the high
ascii code which isn't utf-8, my page crashes with the above error.
so, am now investigating how to better sanitize by stuff, but i am
also thinking that it is a bit crazy that my entire rails stack can be
brought to it's knees by one errant character. i would suggest that
this is not a "feature", represents a significant problem, and should
be made more bullet proof.
i am needing, i guess, some way to just sanitize utf stuff as
html::fullsanitize strips all the html. my use case is that the
content is my own asp generated html, so, i really don't want to have
to whitelist all of the tags.
also, why is it that in rails2.3 a string class variable doesn't have
ruby1.9 encoding methods exposed as they do in irb?