How to detect if a string contains any funny characters from non English alphabets

Hello!

I have a specific problem that maybe you can help with.

Given an input of string (up to 1000 characters), how do I detect if the string is not written in "Roman Alphabets (A-Z)", and by that I mean that the string maybe in Chinese, Korean, or contain non-standard English or western language alphabets, etc.

I just need a function that will return "True" or "False".

Thanks in advance!

-Chris

Well I looked at the API and thought that str.each_char {} might work, but it's not recognized in my version of Ruby (1.8.6 on Ubuntu) so that seems to be a dead end. Here's an ugly hack that works though:

def nonroman_test(str)   if nonroman(str) then     puts "#{str} has nonroman characters!"   else     puts "#{str} does not have nonroman characters!"   end end

def nonroman (str)   (/^[\w\s!@#\$%\^\\&*()\]\[,.?]*$/ =~ str) == nil end

nonroman_test("abc") nonroman_test("abcᴚ")

nonroman(str) return true if the string contains any characters besides letters, digits, whitespace, and the following: !@#$%^&*() ,.?

You can alter the regular expression to change what is allows. Just add any additional allowed characters before the final ] on the line in nonroman(). Some characters may need to have a \ in front of them to work.

Hope that helps!

Regards, David Alves

Hi --

Well I looked at the API and thought that str.each_char {} might work, but it's not recognized in my version of Ruby (1.8.6 on Ubuntu)

Right now, the latest release of Ruby 1.8 is 1.8.7, which is basically a backport of many features from 1.9. The result is that the current API docs have a very 1.9-ish flavor, and you'll see lots of things in there that don't exist in 1.8.6. It's potentially kind of confusing since many of us are still using 1.8.6, and 1.8.7 sounds like it will be more like 1.8.6 than like 1.9. But you can still get the 1.8.6 docs too.

so that seems to be a dead end. Here's an ugly hack that works though:

def nonroman_test(str) if nonroman(str) then    puts "#{str} has nonroman characters!" else    puts "#{str} does not have nonroman characters!" end end

def nonroman (str) (/^[\w\s!@#\$%\^\\&*()\]\[,.?]*$/ =~ str) == nil end

A better way might be:

   def nonroman(str)      str =~ /[^\w\s!...]/    end

(with whatever regex you use). This way, you're testing for the first non-roman character, rather than testing all the characters. It returns nil or a digit; change as needed if you specifically need true/false.

Also, don't forget that ^ and $ are line anchors, not string anchors.

David