Regular Expression help

Hi everyone, I hope it's appropriate to post this question here. I
couldn't find anywhere else to post it (posting to the Ruby group
didn't work). I'm trying to write a regular expression to turn this
string:

'johnny went to the [shops] and played [games,soccer,foot ball] with
others.'

into this:

['johnny went to the', ['shops'], 'and played', ['games', 'soccer',
'foot ball'], 'with others.']

Key features are that white space needs to be stripped from around the
snippets, and I need to be able to determine which parts are just
normal text and which parts are to be used as gaps where people fill
in the missing info. If you haven't already figured it out, this is
for a children's quiz engine. When a phrase appears in square brackets
it's a blank, the system will display a text box for the user to type
into. If there are multiple items in the square brackets seperated by
comma's (and optional spaces around the comma's by lazy users) then
the system will display the choices in a random order in a drop down
box (the first choice in the list is the correct one).

I've had a long go at it myself but I just don't fully grasp the
regexp language:

irb(main):027:0> 'johnny went to the [shops] and played
[games,soccer,foot ball]
with others.'.split(/\[(?:([(?:\s)\w(?:\s)]+)(?:,([(?:\s)\w(?:\s)]
+))*)\]/)
=> ["johnny went to the ", "shops", " and played ", "games", "foot
ball", " with
others."]

I've tried doing groupings but it seems to make things worse :slight_smile:

Any help would be truly appreciated :slight_smile:

Cheers,

Brendon

This is kinda ghetto (particularly needing to check chunk.length) but it works on your sample string.

def parse_item(lin)
  ret = Array.new
  re = Regexp.new(/([^\[\]]+?(\[|\]))+?/)
  lin.split(re).each do |chunk|
    if chunk.length > 1 then
      case chunk
        when /\]$/
          # bracketed option list
          ret << chunk[0..-2].split(',')
        else
          # plain chunk of text
          ret << chunk.gsub(/\[$/, '').strip
      end
    end
  end
  ret.inspect
end

HTH,

-Roy

Woah dude! Thanks for that :slight_smile: I'll work my way through understanding
it now :slight_smile:

Cheers!

Brendon

Here's what we came up with using StringScanner:

def processed_answer

    scanner = StringScanner.new(correct_answers)

    # last position the scanner was in

    last_pos = 0

    output = []

    while scanner.scan_until(/\[(.+?)\]/) # non-greedy matching so we
only swallow the first bracket

      # scanner[0] contains the string matched by the regexp

      # scanner[1] contains the value surrounded the parantheses in
the regexp

      # current position is right up to bracketed text

      current_pos = scanner.pos - 1 - scanner[0].length

      # the 'chunk' is the text from the last position the scanner
stopped at

      # up to the current position

      chunk = scanner.string[last_pos .. current_pos]

      output << chunk unless chunk.empty?

      # store the position the scanner is at for next time round the
loop

      last_pos = scanner.pos

      # extract the question answer thingies

      answers = scanner[1].split(',') # returns array

      # strip whitespace around each possible answer

      answers.each {|v| v.strip! } # works because strip! is
destructive

      # add answers array

      output << answers

    end

    # make sure we have any text after the last question/answer

    last_chunk = scanner.string[last_pos..-1]

    output << last_chunk unless last_chunk.empty?

    output

  end