hi
I have a string: my_string="blablablabla<coordinates>substring</coordinates>blabla"
I need to extract the sentence beetween "<coordinates>" and "</
"
How can I do that? Thanks for your help JF
hi
I have a string: my_string="blablablabla<coordinates>substring</coordinates>blabla"
I need to extract the sentence beetween "<coordinates>" and "</
"
How can I do that? Thanks for your help JF
my_string="blablablabla<coordinates>substring</coordinates>blabla" #the parentheses below define the actual match for the overall regex pattern sub_string = /.*<coordinates>(.*)<\/coordinates>.*/.match(my_string) puts sub_string[0]
Regex is the fastest/most effective for one/off text parsing. Another good option is Whytheluckystiff's Hpricot: http://code.whytheluckystiff.net/hpricot/
Hank
my_string="blablablabla<coordinates>substring</coordinates>blabla" #the parentheses below define the actual match for the overall regex pattern sub_string = /.*<coordinates>(.*)<\/coordinates>.*/.match(my_string) puts sub_string[0]
Regex is the fastest/most effective for one/off text parsing. Another good option is Whytheluckystiff's Hpricot: http://code.whytheluckystiff.net/hpricot/
Hank
You probably want the regexp to be: /<coordinates>(.*)<\/coordinates>/ so there's less backtracking when the .* first tries to gobble everything.
You might also need something like: /<coordinates\b[^>]*>(.*)<\/coordinates>/ If there can be any attributes on the coordinates tag. Of course, if you really do have XML in my_string, a true parser like Hpricot or REXML will be more reliable than regular expressions. For example, if you had to match against: "blahblah<coordinates>first one</
yadayadayada<coordinates>oops! another one</yakyakyak"
would you want the substring to be: "first one</coordinates>yadayadayada<coordinates>oops! another one" (yeah, I didn't think so
-Rob
Rob Biedenharn http://agileconsultingllc.com Rob@AgileConsultingLLC.com
Hi, I would recommend using the Hpricot and you can find the documentation
here:
http://code.whytheluckystiff.net/doc/hpricot
Good luck,
-Conrad
hi
/.*<coordinates>(.*)<\/coordinates>.*/ The reg exp you gave works fine. I tested it with rubular
probleme I can retrieve the substring I always get the whole string.
Here is what i did:
irb(main):001:0> st="<Point><coordinates>-0.954850,46.436960,0</
</Point>"
=> "<Point><coordinates>-0.954850,46.436960,0</coordinates></Point>" irb(main):002:0> sub=/.*<coordinates>(.*)<\/coordinates>.*/.match(st) => #<MatchData:0x7f2040045fd0> irb(main):003:0> sub.inspect => "#<MatchData:0x7f2040045fd0>" irb(main):004:0> sub.to_s => "<Point><coordinates>-0.954850,46.436960,0</coordinates></Point>" irb(main):005:0> sub.string => "<Point><coordinates>-0.954850,46.436960,0</coordinates></Point>" irb(main):006:0> st.match(/.*<coordinates>(.*)<\/coordinates>.*/) => #<MatchData:0x7f2040019fc0> irb(main):007:0> st.match(/.*<coordinates>(.*)<\/coordinates>.*/).to_s => "<Point><coordinates>-0.954850,46.436960,0</coordinates></Point>"
thank you for your help
:
Regexp.match(string) will return you a MatchData object, which is not just the match: It can be accessed as an Array. So: sub[0] returns the entire matched string sub[1], sub[2], ... return the values of the matched back references (the ones between parentheses).
sub[1] is therefore the thing you want to use. No need to use to_s.
ah ok
thank you all for your help