One approach is to transliterate your input, e.g.:
Unidecode!
-- Sean M. Burke, Unidecode!, 2001
That way, "Chrétien" becomes "chretien" or some such for the purpose
of your search, but remains "Chrétien" in the text.
For example, both El-Aaiún and El-Aaiun could reference the same
underlying text:
http://svr225.stepx.com:3388/El-Aaiún
http://svr225.stepx.com:3388/El-Aaiun
This looks really promising, but after reading up on this for a while, I
don't see how to get it to work with Rails... could you give me a few
pointers or direct me to some documentation?
At its core, Unidecode is simply a lookup table. Should be rather straightforward to port to Ruby if it hasn't been done already.
Here is the original Perl implementation:
And bellow is a Lua port of it:
http://dev.alt.textdrive.com/browser/HTTP/Unidecode.lua
As well as the lookup table themselves:
http://dev.alt.textdrive.com/browser/HTTP/Unidecode
Usage example:
local Unidecode = require( 'Unidecode' )
print( 1, 'Москва́', Unidecode( 'Москва́' ) )
print( 2, '北京', Unidecode( '北京' ) )
print( 3, 'Ἀθηνᾶ', Unidecode( 'Ἀθηνᾶ' ) )
print( 4, '서울', Unidecode( '서울' ) )
print( 5, '東京', Unidecode( '東京' ) )
print( 6, '京都市', Unidecode( '京都市' ) )
print( 7, 'नेपाल', Unidecode( 'नेपाल' ) )
> 1 Москва́ Moskva
> 2 北京 beijing
> 3 Ἀθηνᾶ Athena
> 4 서울 seoul
> 5 東京 dongjing
> 6 京都市 jingdushi
> 7 नेपाल nepaal
If Unidecode is too much of a good thing, one could use iconv translit or such, e.g. iconv( 'utf-8', 'us-ascii//TRANSLIT' )...
One way or another, the crux of it is to transliterate your data as well as you query. And then use the later to search the former.
Cheers,