is there a good way to convert "special" accented chars to their base chars? as an example i want "àéìòù" => "aeiou" i'm using several gsub now "àèìòù".gsub("à","a").gsub("è","e").gsub("ì","i")... it works but i wonder if there is something better than this.
Well something like
char_from = “àéìòù”
char_to = “aeiou”
x = “àéìòù”.gsub(char_from, char_to)
puts x
would at least make the code more maintainable
your code convert only that sequence of character. what i need is to convert those accented char in every word. so "città" => "citta", "caffè" => "caffe" and so on maybe some regexp?
Mmm
Just noticed another problem
char_from = “àéìòù”
char_to = “aeiou”
puts char_from.size => 10
puts char_to.size => 5
At least on my Mac. The problem here is encoding.
Looks trickier than I first thought, would be a cinch if this was unicode and we were using Java
Just decompose the unicode character and drop the accent characters.
Ignore everything I have said and lets hope someone who knows about this can suggest a solution, I am intrigued by this.
I'm having this very same problem when String.upcase() is not uppercasing accentuated characters. It seems that the problem is the encoding again.
regards
eugenio wrote:
your code convert only that sequence of character. what i need is to convert those accented char in every word. so "citt�" => "citta", "caff�" => "caffe" and so on maybe some regexp?
You'll probably have to convert your text to normal form D or KD, then filter out combining marks.
Best,