extract a substring

hi

I have a string: my_string="blablablabla<coordinates>substring</coordinates>blabla"

I need to extract the sentence beetween "<coordinates>" and "</

"

How can I do that? Thanks for your help JF

my_string="blablablabla<coordinates>substring</coordinates>blabla" #the parentheses below define the actual match for the overall regex pattern sub_string = /.*<coordinates>(.*)<\/coordinates>.*/.match(my_string) puts sub_string[0]

Regex is the fastest/most effective for one/off text parsing. Another good option is Whytheluckystiff's Hpricot: http://code.whytheluckystiff.net/hpricot/

Hank

my_string="blablablabla<coordinates>substring</coordinates>blabla" #the parentheses below define the actual match for the overall regex pattern sub_string = /.*<coordinates>(.*)<\/coordinates>.*/.match(my_string) puts sub_string[0]

Regex is the fastest/most effective for one/off text parsing. Another good option is Whytheluckystiff's Hpricot: http://code.whytheluckystiff.net/hpricot/

Hank

You probably want the regexp to be:   /<coordinates>(.*)<\/coordinates>/ so there's less backtracking when the .* first tries to gobble everything.

You might also need something like:   /<coordinates\b[^>]*>(.*)<\/coordinates>/ If there can be any attributes on the coordinates tag. Of course, if you really do have XML in my_string, a true parser like Hpricot or REXML will be more reliable than regular expressions. For example, if you had to match against:   "blahblah<coordinates>first one</

yadayadayada<coordinates>oops! another one</yakyakyak"

would you want the substring to be:   "first one</coordinates>yadayadayada<coordinates>oops! another one" (yeah, I didn't think so :wink:

-Rob

Rob Biedenharn http://agileconsultingllc.com Rob@AgileConsultingLLC.com

Hi, I would recommend using the Hpricot and you can find the documentation

here:

http://code.whytheluckystiff.net/doc/hpricot

Good luck,

-Conrad

hi

/.*<coordinates>(.*)<\/coordinates>.*/ The reg exp you gave works fine. I tested it with rubular

probleme I can retrieve the substring I always get the whole string.

Here is what i did:

irb(main):001:0> st="<Point><coordinates>-0.954850,46.436960,0</

</Point>"

=> "<Point><coordinates>-0.954850,46.436960,0</coordinates></Point>" irb(main):002:0> sub=/.*<coordinates>(.*)<\/coordinates>.*/.match(st) => #<MatchData:0x7f2040045fd0> irb(main):003:0> sub.inspect => "#<MatchData:0x7f2040045fd0>" irb(main):004:0> sub.to_s => "<Point><coordinates>-0.954850,46.436960,0</coordinates></Point>" irb(main):005:0> sub.string => "<Point><coordinates>-0.954850,46.436960,0</coordinates></Point>" irb(main):006:0> st.match(/.*<coordinates>(.*)<\/coordinates>.*/) => #<MatchData:0x7f2040019fc0> irb(main):007:0> st.match(/.*<coordinates>(.*)<\/coordinates>.*/).to_s => "<Point><coordinates>-0.954850,46.436960,0</coordinates></Point>"

thank you for your help

:

Regexp.match(string) will return you a MatchData object, which is not just the match: It can be accessed as an Array. So: sub[0] returns the entire matched string sub[1], sub[2], ... return the values of the matched back references (the ones between parentheses).

sub[1] is therefore the thing you want to use. No need to use to_s.

ah ok

thank you all for your help