rails2.3/ruby1.9: invalid byte sequence in utf-8 with blank?

hi all,

anyone seen this controller argument error:

invalid byte sequence in utf-8

ror/vendor/rails/activesupport/lib/active_support/core_ext/blank.rb: 50:in `=~' ror/vendor/rails/activesupport/lib/active_support/core_ext/blank.rb: 50:in `!~' ror/vendor/rails/activesupport/lib/active_support/core_ext/blank.rb: 50:in `blank?' ror/vendor/rails/actionpack/lib/action_controller/response.rb:119:in `etag=' ror/vendor/rails/actionpack/lib/action_controller/response.rb:185:in `handle_conditional_get!' ror/vendor/rails/actionpack/lib/action_controller/response.rb:143:in `prepare!' ror/vendor/rails/actionpack/lib/action_controller/base.rb:531:in `send_response' ror/vendor/rails/actionpack/lib/action_controller/base.rb:525:in `process' ror/vendor/rails/actionpack/lib/action_controller/filters.rb:606:in `process_with_filters' ror/vendor/rails/actionpack/lib/action_controller/base.rb:391:in `process' ror/vendor/rails/actionpack/lib/action_controller/base.rb:386:in `call' ror/vendor/rails/actionpack/lib/action_controller/routing/route_set.rb: 433:in `call'

can't seem to figure out what exact string it is choking on, but it is occurring when i call an index page without and id to get the total list.


as opposed to:


where 23 is a category of documents.

as an update, i added a rescue clause to blank.rb to find out what was choking:

blank.rb:50 class String #:nodoc:   def blank?     self !~ /\S/

  rescue raise "#{self.class} #{self.encoding.name} #{self.valid_encoding?} # {self}"   end end

i reran my page and checked the log. what i get is that it is the html of my entire page that is choking. it says that the encoding is utf-8 but self.valid_encoding is false. the text of the page follows.

RuntimeError (String UTF-8 false <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/ TR/html4/strict.dtd"> <html> ... the rest of my page

any ideas are appreciated...

Can you provide the Ruby version, Rails version, and platform?

Also, do you have a file(s) to recreate the test? For example,

if the error is occurring in Ruby file(s) or ERB template(s), then

please copy-paste the relevant file(s) in your post.


thanks conrad,

debian lenny apache2-mpm-worker dbd-odbc (0.2.4) dbi (0.4.1) deprecated (2.0.1) fastthread (1.0.7) json (1.1.4) passenger (2.1.3) pg (0.8.0) rack (0.9.1) rails-sqlserver-2000-2005-adapter (2.2.15) rake (0.8.4) ruby-1.9.1p0 compiled with pthreads, shared rails2.3 branch in vendor/rails

vendor/rails/activesupport/lib/active_support/core_ext/blank.rb line 50 class Object   # An object is blank if it's false, empty, or a whitespace string.   # For example, "", " ", +nil+, , and {} are blank.

one last update. i discovered that the problem is that i have windows asp content from a legacy server that i use Net::HTTP to import into my rails site. windows allows high ascii value codes (in this case ascii 150/unicode 8211 resembles as dash, but isn't. it's a fancy longer windows dash). anyway, someone entered data presumably on a windows box and pasted some stuff into a text field. that text field got saved in sql server fine. but when rails tries to parse the high ascii code which isn't utf-8, my page crashes with the above error.

so, am now investigating how to better sanitize by stuff, but i am also thinking that it is a bit crazy that my entire rails stack can be brought to it's knees by one errant character. i would suggest that this is not a "feature", represents a significant problem, and should be made more bullet proof.

i am needing, i guess, some way to just sanitize utf stuff as html::fullsanitize strips all the html. my use case is that the content is my own asp generated html, so, i really don't want to have to whitelist all of the tags.

also, why is it that in rails2.3 a string class variable doesn't have ruby1.9 encoding methods exposed as they do in irb?