Problem with regexp split

I am trying to split some text into an array seperated by one or more <br>

Here is some test code:

s = "one<br>two<br><br>three<br><br><br>four" p s.split(/(<br>)+/);

it should split into ["one","two","three","four"] because the / (<br>)+/ pattern should use one or more <br> as the pattern to split around

but it does this ["one", "<br>", "two", "<br>", "three"]

Why does it do this and what split could I use to get it to work?

Note:, I know that I could just fix it by removeing the <br> lines after it is done from the array, but it seems that the regular expression in split should work.

Interesting. Docs say:

   If pattern is a String, then its contents are used as the delimiter when splitting str. If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored.

   If pattern is a Regexp, str is divided where the pattern matches. Whenever the pattern matches a zero-length string, str is split into individual characters.

Which seems to be saying exactly what you are are describing. If a regexp is used the match isn't "eaten", but simply divided on.

You could split it on "<br>" and then remove any blank elements... not sure if that's any better than your alternative approach though.

yea, I have been using reg exp and ruby for years. and this is a puzzle.

Also does not behave with this code:

s = "onexytwoxyxythreexyxyxyfour" p s.split(/(xy)+/)

Try this:

s = "one<br>two<br><br>three<br><br><br>four" array = s.split('<br>') array.compact.reject { |i| i.nil? or i.empty? }

This will produce:

['one', 'two', 'three', 'four' ]

Regards,

Atc., Kirk Patrick

array.compact.reject { |i| i.nil? or i.empty? } seemed to leave some unwanted elements, at least on my Ruby 1.8.6.

But array.delete_if { |i| i.nil? or i.empty? } worked as expected on my machine.

HTH, Richard

array.delete_if { |i| i.nil? or i.empty? }

I am trying to split some text into an array seperated by one or more


Here is some test code:

s = “one
two

three


four”

p s.split(/(
)+/);

it should split into [“one”,“two”,“three”,“four”] because the /

(
)+/ pattern should use one or more
as the pattern to split

around

but it does this

[“one”, “
”, “two”, “
”, “three”]

Why does it do this and what split could I use to get it to work?

Note:, I know that I could just fix it by removeing the
lines

after it is done from the array, but it seems that the regular

expression in split should work.

Gerry, you can do the following:

p s.gsub(/
/, " " ).split

Good luck,

-Conrad

RichardOnRails wrote:

array.compact.reject { |i| i.nil? or i.empty? } seemed to leave some unwanted elements, at least on my Ruby 1.8.6.

But array.delete_if { |i| i.nil? or i.empty? } worked as expected on my machine.

HTH, Richard

array.delete_if { |i| i.nil? or i.empty? }

My Ruby version is 1.8.7 But the more important is the problem solved. =P

The trick here is a feature inherited from Perl - groups (in parens) in the regexp cause the delimiters to be included. This works like you'd expect:

s.split(/(?:<br>)+/)

the ?: modifier tells the parens to group without providing a backref.

--Matt Jones