extract a substring

hi

I have a string:
my_string="blablablabla<coordinates>substring</coordinates>blabla"

I need to extract the sentence beetween "<coordinates>" and "</

"

How can I do that?
Thanks for your help
JF

my_string="blablablabla<coordinates>substring</coordinates>blabla"
#the parentheses below define the actual match for the overall regex
pattern
sub_string = /.*<coordinates>(.*)<\/coordinates>.*/.match(my_string)
puts sub_string[0]

Regex is the fastest/most effective for one/off text parsing. Another
good option is Whytheluckystiff's Hpricot:
http://code.whytheluckystiff.net/hpricot/

Hank

my_string="blablablabla<coordinates>substring</coordinates>blabla"
#the parentheses below define the actual match for the overall regex
pattern
sub_string = /.*<coordinates>(.*)<\/coordinates>.*/.match(my_string)
puts sub_string[0]

Regex is the fastest/most effective for one/off text parsing. Another
good option is Whytheluckystiff's Hpricot:
http://code.whytheluckystiff.net/hpricot/

Hank

You probably want the regexp to be:
  /<coordinates>(.*)<\/coordinates>/
so there's less backtracking when the .* first tries to gobble everything.

You might also need something like:
  /<coordinates\b[^>]*>(.*)<\/coordinates>/
If there can be any attributes on the coordinates tag. Of course, if you really do have XML in my_string, a true parser like Hpricot or REXML will be more reliable than regular expressions. For example, if you had to match against:
  "blahblah<coordinates>first one</

yadayadayada<coordinates>oops! another one</yakyakyak"

would you want the substring to be:
  "first one</coordinates>yadayadayada<coordinates>oops! another one"
(yeah, I didn't think so :wink:

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

Hi, I would recommend using the Hpricot and you can find the documentation

here:

http://code.whytheluckystiff.net/doc/hpricot

Good luck,

-Conrad

hi

/.*<coordinates>(.*)<\/coordinates>.*/ The reg exp you gave works
fine. I tested it with rubular

probleme I can retrieve the substring I always get the whole string.

Here is what i did:

irb(main):001:0> st="<Point><coordinates>-0.954850,46.436960,0</

</Point>"

=> "<Point><coordinates>-0.954850,46.436960,0</coordinates></Point>"
irb(main):002:0> sub=/.*<coordinates>(.*)<\/coordinates>.*/.match(st)
=> #<MatchData:0x7f2040045fd0>
irb(main):003:0> sub.inspect
=> "#<MatchData:0x7f2040045fd0>"
irb(main):004:0> sub.to_s
=> "<Point><coordinates>-0.954850,46.436960,0</coordinates></Point>"
irb(main):005:0> sub.string
=> "<Point><coordinates>-0.954850,46.436960,0</coordinates></Point>"
irb(main):006:0> st.match(/.*<coordinates>(.*)<\/coordinates>.*/)
=> #<MatchData:0x7f2040019fc0>
irb(main):007:0> st.match(/.*<coordinates>(.*)<\/coordinates>.*/).to_s
=> "<Point><coordinates>-0.954850,46.436960,0</coordinates></Point>"

thank you for your help

:

Regexp.match(string) will return you a MatchData object, which is not
just the match: It can be accessed as an Array. So:
sub[0] returns the entire matched string
sub[1], sub[2], ... return the values of the matched back references
(the ones between parentheses).

sub[1] is therefore the thing you want to use. No need to use to_s.

ah ok

thank you all for your help