I'm expecting a validate_format_of with a regex like this
/^[a-zA-Z\xC0-\xD6\xD9-\xF6\xF9-\xFF\.\'\-\ ]*?$/
to allow many of the normal characters like ö é å to be submitted via web form.
However, the extended characters are being rejected.
This works just fine though (which is just a-zA-Z)
/^[\x41-\x5A\x61-\x7A\.\'\-\ ]*?$/
So, what's the secret to using unicode character ranges in Ruby regex or Rails validations?
It also seems to fail with full \x0000 numbers, is there limit at \xFF?
OK, so now that I've come to recognize that unicode support in Ruby totally blows, are there any hacks out there anywhere?
I want to:
- allow a web site visitor to enter the "usual" extended latin characters into a web form
- use a regular expression (this is where the crux of the problem is) to ensure that all characters in the string are allowed
- save that data to MySQL (utf8)
- display it with the correct characters in tact
It's no problem to capture the text store it & redisplay it, but without filtering/validation--which of course is not acceptable.
Is anyone doing white listed character validations like this?