Microsoft "stupid quotes" in params

I'm looking for feedback on this, read on, please...

I'm just fed up with Microsoft's "stupid quotes" feature (and for sake
of later searchers, I'll add that it's often known as "smart quotes",
although as with anything Microsoft you're safe to substitute the word
"stupid" anywhere they use the word "smart").

I just completed a nice application, and suddenly an external piece
failed. It first uses xmlrpc to grab some data from the database and
stick it in a yaml file. A couple of other programs read the yaml file
and create various other files. Those programs were crapping because it
couldn't read the entire yaml file.

It turns out that the problem was with people using stupid quotes.
Here's the sledgehammer that I applied in
app/controllers/application.rb:

  before_filter :fix_stupid_quotes_in_params

  def fix_stupid_quotes_in_params
    dig_deep(@params) { |s| fix_stupid_quotes!(s) }
  end

  def dig_deep(hash, &block)
    if hash.instance_of? String
      yield(hash)
    elsif hash.kind_of? Hash
      hash.each_key { |h| dig_deep(hash[h]) { |s| block.call(s) } }
    else
      nil
    end
  end

  def fix_stupid_quotes!(s)
    s.gsub!(/\x82/,',')
    s.gsub!(/\x84/,',')
    s.gsub!(/\x85/,'…')
    s.gsub!(/\x88/,'^')
    s.gsub!(/\x89/,'o/oo')
    s.gsub!(/\x8b/,'<')
    s.gsub!(/\x8c/,'OE')
    s.gsub!(/\x91|\x92/,"'")
    s.gsub!(/\x93|\x94/,'"')
    s.gsub!(/\x95/,'*')
    s.gsub!(/\x96/,'-')
    s.gsub!(/\x97/,'–')
    s.gsub!(/\x98/,'~')
    s.gsub!(/\x99/,'TM')
    s.gsub!(/\x9b/,'>')
    s.gsub!(/\x9c/,'oe')
  end

If this is a bad idea, I'll have to implement it on one particular page.
The fact is, though, that these characters are always invalid (in
Latin/UTF-8 type char sets) so I see no reason to allow them through
ever. I hate modifying the params, but again, these are just not valid
characters. I don't want to have to think about it in each model or
controller.

This is a sledgehammer approach, as it will always walk through params
on every page and fix the stupid quotes characters. I'm looking for any
thoughts, suggestions, comments, etc. on the above code.

Thanks,
Michael

IIRC, these are double-byte characters. The problem is not so much in
using them, but in interpreting them. For e.g., user types stuff into
Word, copies, then pastes into a textarea. Hits send. Application
obediently stores in database. Application displays data and s(mart|
upid) quotes are in place correctly. Then programmer gets a wild idea
-- like restoring the database from a backup. Because the backup is a
text file, the DBCs are misinterpreted as they are imported into the
database. Result, improper display of these characters.

If you come up with a solution that works for these cute characters
that (whatever you call them) everyone has in their word processing
documents, let us all know. Here are some references that sort of work:

demoronizer (Perl script) www.fourmilab.ch/webtools/demoroniser

I can't attribute this second one, but it's a shell script. I tried
it on a database dump and it left me with less cleanup work -- maybe
it will provide some clues for you:

#!/bin/sh
this_directory=`pwd`
for x
do
echo -n "converting $x: "
if test "$x" = runiconv.sh; then
echo "not editing script itself!"
elif [ -d $x ]; then
(cp runiconv.sh $x; cd $x; sh runiconv.sh *; rm -f runiconv.sh cd .. )
elif test -s $x; then
iconv --from-code=euc-kr --to-code=UTF-8 < $x > $this_directory/$x$$ ;
if [ $? == 0 ]
then
cp $this_directory/$x$$ $x
rm -f $this_directory/$x$$
else
echo -n "ICONVE ERROr "
rm -f $this_directory/$x$$
fi
echo "done";
else
echo "original file is empty"
fi
done
echo "all done"