Dmitry, my gut feeling is that you have to enforce POST encoding in the
form at least, or otherwise detect when you have not received a utf-8
encoded POST data string.
I am at a loss as to how a latin-1 string ended up *bigger* than a UTF-8
one, but its possible that you might have encountered some cut&paste
artifacts. Try entering an umlaut using the character map (i.e. more
naturally).
Well yes, but its not Rails fault. In fact anyone can pass any
kind of information to any kind of web system. Your system has
to be robust enough to handle it.
Even by your best efforts to ensure everything comes across as
utf-8, users can still force it to be something that won't display
properly, like latin-1, Shift-JIS or whatever. In those cases you
have to detect that you have received an invalid encoding and
either convert it to utf-8 or send back an error message.
I just thought that a particularly clever hacker might be able
to exploit encoding confusion with multi-byte encoding
systems to get around cross-site-scripting defences. Its just a thought,
and I am thinking in general, not in a Rails context (which has
some fairly serious XSS defences)