Translating international characters

Hi

I need to convert strings with international characters to strings
with corresponding ASCII codes. For example é, è, ë, and ê (and all
other e-related versions) should convert to e and so on.

Does anyone have a good solution on this?

Kindest regards

Erik

Hi

I need to convert strings with international characters to strings
with corresponding ASCII codes. For example é, è, ë, and ê (and all
other e-related versions) should convert to e and so on.

I once did something very crude, which for your purpose would look
something like this:

     def preprocess(query)
       normalized = query.chars.normalize :d
       processed = ""
       normalized.u_unpack.each do |c|
         if c >= 0x300 && c < 0x370 #combining marks
         else
           processed << [c].pack('U*')
         end
       processed
     end

Fred

Create a file core_extensions.rb in /lib/ and stick this in:

require ‘iconv’

class String

def to_ascii

Iconv.iconv("ASCII//IGNORE//TRANSLIT", "UTF-8", self).join.sanitize

rescue

self.sanitize

end

def sanitize

self.gsub(/[^a-z._0-9 -]/i, "").downcase

end

end

Restart your rails server to load the file. Then when you want to convert the string, you just do something like “Thïs ïs à téststrïng”.to_ascii and it will convert the characters to their ascii equivalent.

Best regards

Peter De Berdt

convert é, è, ë, and ê .. to e, etc...

Try
    str = DiacriticsFu::escape(source)
with

file /lib/diacritic_fu.rb :
module DiacriticsFu
  def self.escape(str)
      ActiveSupport::Multibyte::Handlers::UTF8Handler.
            normalize(str,:d).
            split(//u).
            reject { |e| e.length > 1 }.
            join
  end
end

, by Thibaut Barrère
(found here : http://groups.google.ca/group/MephistoBlog/browse_thread/thread/afe817a4a594ddde
there's even a test suite)

For example, I extended String with
class String
  # "Un été À la maison".to_slug(true) == "un-ete-a-la-maison"
  def to_slug(force_downcase=false)
    str = DiacriticsFu::escape(self)
    str.gsub!(/[^a-zA-Z0-9 ]/,"")
    str.gsub!(/[ ]+/," ")
    str.gsub!(/ /,"-")
    force_downcase ? str.downcase : str
  end
end

Alain

Great advice from everybody. I will try these and see how they work.
Thanks.

Erik