Extract vowels and consonants using Ruby Regex

Hello,

I am trying to build a regex to extract vowels and consonants from a
string. So far, I am able to extract the basic a-e-i-o-u sequence
using the following extension to the String class:

class String
  def vowels
    scan(/[aeiou]/i)
  end
  def consonants
    scan(/[^aeiou]/i)
  end
end

examples:

"Mary had a little lamb".vowels

=> aaaiea

"Mary had a little lamb".consonants

=> mryhdlttllmb

However, the regex does not accommodate the conditional treatment of
'y' as a vowel *if there is no other vowel before or after it.* If
properly implemented, the previous examples would return: ayaaiea
(vowels) and mrhdlttllmb (consonants).

According to this post (http://www.perlmonks.org/?node_id=592867),
this could be accommodated in Perl using "zero-width negative-look-
behind" and "zero-width negative-look-ahead" assertions as follows:

my @vowels = ( /[aeiou]|(?<![aeiou])y(?![aeiou])/gi );

Where, the "(?<!...)" is a "zero-width negative-look-behind assertion"
and the "(?!...)" is a "zero-width negative-look-ahead assertion".

I have since discovered that Ruby 1.8 lacks regex look-behind
assertion so one can't simply translate this code fragment to Ruby
regex syntax.

So, the question is: how can I accomplish the end result in Ruby (a-e-
i-o-u + the conditional treatment of 'y' as a vowel *if there is no
other vowel before or after it.*) ? Any thoughts are appreciated.

Dondi.

Install the oniguruma gem.

   gem install oniguruma

It requires the C lib (installable from ports), but is installed by
default with Ruby 1.9. It handles lookbehind.

--Jeremy

Thanks for the heads-up on the gem Jeremy. I was neither bold enough
to attempt a custom Ruby build nor to jump to Ruby 1.9 just yet, so
this is a perfect alternative.

I installed the gem and the C lib, and placed 'require oniguruma' in
application.rb, but I am receiving a 'MissingSourceFile (no such file
to load -- oniguruma)' error during application load. Here's an
excerpt from the Rails console:

Machine:appdir User$ script/server
=> Booting Mongrel (use 'script/server webrick' to force WEBrick)
=> Rails application starting on http://0.0.0.0:3000
=> Call with -d to detach
=> Ctrl-C to shutdown server
** Starting Mongrel listening at 0.0.0.0:3000
** Starting Rails with development environment...
** Rails loaded.
** Loading any Rails specific GemPlugins
** Signals ready. TERM => stop. USR2 => restart. INT => stop (no
restart).
** Rails signals registered. HUP => reload (without restart). It
might not work well.
** Mongrel available at 0.0.0.0:3000
** Use CTRL-C to stop.

Processing Base#index (for 127.0.0.1 at 2008-02-03 19:47:42) [GET]
  Session ID:
BAh7CToMY3NyZl9pZCIlNGNmMzg5M2Y2MmMyYzNjOTg0ZmJlZTYxZjZiOGQz
%0AMzgiCmZsYXNoSUM6J0FjdGlvbkNvbnRyb2xsZXI6OkZsYXNoOjpGbGFzaEhh
%0Ac2h7AAY6CkB1c2VkewA6DnJldHVybl90bzA6DHVzZXJfaWRpGQ%3D%3D--
c69ffa8adf4a50532ab06c5abc6205c90235ca54
  Parameters: {}

MissingSourceFile (no such file to load -- oniguruma):
    /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
`gem_original_require'
    /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
`require'
    /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/
active_support/dependencies.rb:496:in `require'
    /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/
active_support/dependencies.rb:342:in `new_constants_in'
    /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/
active_support/dependencies.rb:496:in `require'
    /app/controllers/application.rb:9

Line 9 of application.rb contains "require 'oniguruma'."

I followed the standard install process for installing the onig-5.9.1
package:

  1. `cd' to the directory containing the package's source code, type
     sudo ./configure

  2. Type `make' to compile the package.

  3. Type `make install' to install the programs, data files and
     documentation.

I am running Rails 2.0.2 and Ruby is installed at usr/local on my
system (which is consistent with the default oniguruma install
location of usr/local/bin and /usr/local/man) so I am at a loss for an
explanation of the error. Any thoughts?

Dondi.

OK, solved the "MissingSourceFile" error by re-installing the gem. Now
I am receiving the following error:

** Starting Mongrel listening at 0.0.0.0:3000
** Starting Rails with development environment...
Exiting
/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
`gem_original_require': ./lib/string_extensions.rb:4: undefined (?...)
sequence: /[aeiou]|(?<![aeiou])y(?![aeiou])/ (SyntaxError)
./lib/string_extensions.rb:8: undefined (?...) sequence: /![aeiou]|(?
<=[aeiou])y(?=[aeiou])/ from /usr/local/lib/ruby/site_ruby/1.8/
rubygems/custom_require.rb:27:in `require'

It seems to be complaining about the look-behind and look-ahead
assertions in the following code fragment:

class String
  def vowels
    scan(/[aeiou]|(?<![aeiou])y(?![aeiou])/i)
  end
  def consonants
    scan(/![aeiou]|(?<=[aeiou])y(?=[aeiou])/i)
  end
end

According to this reference (http://www.geocities.jp/kosako3/oniguruma/
doc/RE.txt) the look behind and look ahead syntax appear to be correct
(ref section 7. Extended groups). This suggests either:

A. Ruby may be using the default regexp library instead of the
oniguruma regexp library,

B. The oniguruma regexp library is not accessible via the 'scan'
method, or

C. Something else entirely

... hmmm ... <scratches head/>

Thanks for all the help everyone. The problem was solved with the help
from pullmonkey on Rails Forum! Here is the solution:

Objective:

1. Extract vowels and consonants from a string
2. Handle the conditional treatment of 'y' as a vowel under the
following circumstances:
     - y is a vowel if it is surrounded by consonants
     - y is a consonant if it is adjacent to a vowel

Here is the code that works:

  def vowels(name_str)
    reg = Oniguruma::ORegexp.new('[aeiou]|(?<![aeiou])y(?![aeiou])')
    reg.match_all(name_str).to_s.scan(/./)
  end

  def consonants(name_str)
    reg = Oniguruma::ORegexp.new('[bcdfghjklmnpqrstvwx]|(?<=[aeiou])y|
y(?=[aeiou])')
    reg.match_all(name_str).to_s.scan(/./)
  end

(Note, the .scan(/./) can be eliminated to return an array)

The major problem was getting the code to accurately treat "y" as a
consonant. The key to solving this problem was to:

1. define unconditional consonants explicitly (i.e.,
[bcdfghjklmnpqrstvwx]) -- not as [^aeiou] which automatically includes
"y" thus OVER-RIDING any conditional reatment of "y" that follows

2. define conditional "y" regexp assertions independently, i.e., "| (?
<=[aeiou]) y | y (?=[aeiou])" -- not "|(?<=[aeiou]) y (?=[aeiou])"
which only matches "y" preceded AND followed by a vowel, not preceded
OR followed by a vowel

HTH.