help needed with regular expression

Hi

Im working through the "Best of ruby quiz" book which some of you might be familiar with, but hey dont worry if not, you can probably still help me :slight_smile: - I've found a regular expression that does what I want, but not quite sure why it works.

Given:

story = "The ((velocity)) ((colour)) ((wildbeast)) ((action)) over the ((adjective)) ((domesticbeast))"

I want to parse this into an array such that each element of the array is the string split on the "((blabla))" bits. This does that:

irb(main):052:0> story.split /\(\(.*?\)\)/ => ["The ", " ", " ", " ", " over the ", " "]

However I also want the sections marked "((blabla))" included as well... I fiddled a bit and got this, which works: irb(main):053:0> story.split /(\(\(.*?\)\))+/ => ["The ", "((velocity))", " ", "((colour))", " ", "((wildbeast))", " ", "((action))", " over the ", "((adjective))", " ", "((domesticbeast))"]

However Im not exactly sure what makes this work - can anyone illuminate this for me?

glenn

Given:

story = "The ((velocity)) ((colour)) ((wildbeast)) ((action)) over the ((adjective)) ((domesticbeast))"

I want to parse this into an array such that each element of the array is the string split on the "((blabla))" bits. This does that:

irb(main):052:0> story.split /\(\(.*?\)\)/ => ["The ", " ", " ", " ", " over the ", " "]

However I also want the sections marked "((blabla))" included as well... I fiddled a bit and got this, which works: irb(main):053:0> story.split /(\(\(.*?\)\))+/ => ["The ", "((velocity))", " ", "((colour))", " ", "((wildbeast))", " ", "((action))", " over the ", "((adjective))", " ", "((domesticbeast))"]

However Im not exactly sure what makes this work - can anyone illuminate this for me?

String#split will normally take a pattern representing a delimiter, and split the string into parts that are separated by the delimiter, returning the parts.

However, if you enclose the pattern in capturing parens, then split returns both the parts *and* the delimiters.

So:

  >> "foo-bar-baz".split(/-/)   => ["foo", "bar", "baz"]   >> "foo-bar-baz".split(/(-)/)   => ["foo", "-", "bar", "-", "baz"]

Your pattern is encosed in parens, so it will get returned along with the parts between the pattern.

The pattern is:

  (\(\(.*?\)\))+

Working from the inside:

  \(\( two literal left parens, followed by   .*? match shortest sequence of any char except \n, followed by   \)\) two literal right parens

This is wrapped in (), which are capturing parens (since they aren't escaped with a backslash)

The pattern is followed by a +, which means "occurring one or more times". You may not want this, because it would treat "((foo))((bar))" as a single delimiter.

Now when you split on this, you get all the "((sometext))" elements, together with the stuff in between them.

If you just want to capture the "((sometext))" words, you should look at String#scan