Extract vowels and consonants using Ruby Regex

Hello,

I am trying to build a regex to extract vowels and consonants from a string. So far, I am able to extract the basic a-e-i-o-u sequence using the following extension to the String class:

class String   def vowels     scan(/[aeiou]/i)   end   def consonants     scan(/[^aeiou]/i)   end end

examples:

"Mary had a little lamb".vowels

=> aaaiea

"Mary had a little lamb".consonants

=> mryhdlttllmb

However, the regex does not accommodate the conditional treatment of 'y' as a vowel *if there is no other vowel before or after it.* If properly implemented, the previous examples would return: ayaaiea (vowels) and mrhdlttllmb (consonants).

According to this post (http://www.perlmonks.org/?node_id=592867), this could be accommodated in Perl using "zero-width negative-look- behind" and "zero-width negative-look-ahead" assertions as follows:

my @vowels = ( /[aeiou]|(?<![aeiou])y(?![aeiou])/gi );

Where, the "(?<!...)" is a "zero-width negative-look-behind assertion" and the "(?!...)" is a "zero-width negative-look-ahead assertion".

I have since discovered that Ruby 1.8 lacks regex look-behind assertion so one can't simply translate this code fragment to Ruby regex syntax.

So, the question is: how can I accomplish the end result in Ruby (a-e- i-o-u + the conditional treatment of 'y' as a vowel *if there is no other vowel before or after it.*) ? Any thoughts are appreciated.

Dondi.

Install the oniguruma gem.

   gem install oniguruma

It requires the C lib (installable from ports), but is installed by default with Ruby 1.9. It handles lookbehind.

--Jeremy

Thanks for the heads-up on the gem Jeremy. I was neither bold enough to attempt a custom Ruby build nor to jump to Ruby 1.9 just yet, so this is a perfect alternative.

I installed the gem and the C lib, and placed 'require oniguruma' in application.rb, but I am receiving a 'MissingSourceFile (no such file to load -- oniguruma)' error during application load. Here's an excerpt from the Rails console:

Machine:appdir User$ script/server => Booting Mongrel (use 'script/server webrick' to force WEBrick) => Rails application starting on http://0.0.0.0:3000 => Call with -d to detach => Ctrl-C to shutdown server ** Starting Mongrel listening at 0.0.0.0:3000 ** Starting Rails with development environment... ** Rails loaded. ** Loading any Rails specific GemPlugins ** Signals ready. TERM => stop. USR2 => restart. INT => stop (no restart). ** Rails signals registered. HUP => reload (without restart). It might not work well. ** Mongrel available at 0.0.0.0:3000 ** Use CTRL-C to stop.

Processing Base#index (for 127.0.0.1 at 2008-02-03 19:47:42) [GET]   Session ID: BAh7CToMY3NyZl9pZCIlNGNmMzg5M2Y2MmMyYzNjOTg0ZmJlZTYxZjZiOGQz %0AMzgiCmZsYXNoSUM6J0FjdGlvbkNvbnRyb2xsZXI6OkZsYXNoOjpGbGFzaEhh %0Ac2h7AAY6CkB1c2VkewA6DnJldHVybl90bzA6DHVzZXJfaWRpGQ%3D%3D-- c69ffa8adf4a50532ab06c5abc6205c90235ca54   Parameters: {}

MissingSourceFile (no such file to load -- oniguruma):     /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `gem_original_require'     /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require'     /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/ active_support/dependencies.rb:496:in `require'     /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/ active_support/dependencies.rb:342:in `new_constants_in'     /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/ active_support/dependencies.rb:496:in `require'     /app/controllers/application.rb:9

Line 9 of application.rb contains "require 'oniguruma'."

I followed the standard install process for installing the onig-5.9.1 package:

  1. `cd' to the directory containing the package's source code, type      sudo ./configure

  2. Type `make' to compile the package.

  3. Type `make install' to install the programs, data files and      documentation.

I am running Rails 2.0.2 and Ruby is installed at usr/local on my system (which is consistent with the default oniguruma install location of usr/local/bin and /usr/local/man) so I am at a loss for an explanation of the error. Any thoughts?

Dondi.

OK, solved the "MissingSourceFile" error by re-installing the gem. Now I am receiving the following error:

** Starting Mongrel listening at 0.0.0.0:3000 ** Starting Rails with development environment... Exiting /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `gem_original_require': ./lib/string_extensions.rb:4: undefined (?...) sequence: /[aeiou]|(?<![aeiou])y(?![aeiou])/ (SyntaxError) ./lib/string_extensions.rb:8: undefined (?...) sequence: /![aeiou]|(? <=[aeiou])y(?=[aeiou])/ from /usr/local/lib/ruby/site_ruby/1.8/ rubygems/custom_require.rb:27:in `require'

It seems to be complaining about the look-behind and look-ahead assertions in the following code fragment:

class String   def vowels     scan(/[aeiou]|(?<![aeiou])y(?![aeiou])/i)   end   def consonants     scan(/![aeiou]|(?<=[aeiou])y(?=[aeiou])/i)   end end

According to this reference (サービス終了のお知らせ doc/RE.txt) the look behind and look ahead syntax appear to be correct (ref section 7. Extended groups). This suggests either:

A. Ruby may be using the default regexp library instead of the oniguruma regexp library,

B. The oniguruma regexp library is not accessible via the 'scan' method, or

C. Something else entirely

... hmmm ... <scratches head/>

Thanks for all the help everyone. The problem was solved with the help from pullmonkey on Rails Forum! Here is the solution:

Objective:

1. Extract vowels and consonants from a string 2. Handle the conditional treatment of 'y' as a vowel under the following circumstances:      - y is a vowel if it is surrounded by consonants      - y is a consonant if it is adjacent to a vowel

Here is the code that works:

  def vowels(name_str)     reg = Oniguruma::ORegexp.new('[aeiou]|(?<![aeiou])y(?![aeiou])')     reg.match_all(name_str).to_s.scan(/./)   end

  def consonants(name_str)     reg = Oniguruma::ORegexp.new('[bcdfghjklmnpqrstvwx]|(?<=[aeiou])y| y(?=[aeiou])')     reg.match_all(name_str).to_s.scan(/./)   end

(Note, the .scan(/./) can be eliminated to return an array)

The major problem was getting the code to accurately treat "y" as a consonant. The key to solving this problem was to:

1. define unconditional consonants explicitly (i.e., [bcdfghjklmnpqrstvwx]) -- not as [^aeiou] which automatically includes "y" thus OVER-RIDING any conditional reatment of "y" that follows

2. define conditional "y" regexp assertions independently, i.e., "| (? <=[aeiou]) y | y (?=[aeiou])" -- not "|(?<=[aeiou]) y (?=[aeiou])" which only matches "y" preceded AND followed by a vowel, not preceded OR followed by a vowel

HTH.