I need some help on RegEx to detect First and Last names. This is what I
This is used to detect a First and Last name where two words are next to
each other that begin with a capital letter. So it will detect:
I run into problems where the name is close to the beginning of the
Having John Smith over for dinner. --- This will look at "Having John"
Getting Jane Smith ready for school. --- This will look at "Getting
Do you know how to do a RegEx where it will ignore the first word
whenever three capitalized words are next to each other? Thanks!
first you have to check whether there is three capital words are there
if str.match(/([A-Z]+[a-zA-Z]* [A-Z]+[a-zA-Z]* [A-Z]+[a-zA-Z]*)/)
# Do something
elsif str.match/([A-Z]+[a-zA-Z]* [A-Z]+[a-zA-Z]*)/
# Do something
I hope this will help u..
That's close. You want something like
Which gives you
=> "Having Jane Smith"
irb(main):022:0> x =~ /\A([A-Z]+[a-zA-Z]*)\s+([A-Z]+[a-zA-Z]*)\s+([A-Z]
You know this is not something you're going to solve with regular
expressions, though, right?
"San Francisco's Jane Smith, quoted in Broder's Washington Post
article, said ..."
You need a lot more heuristics than a simple RegEx to reliably find
names in a block of text.
Some other cases to consider
John Phillip Sousa (or if you're a kid a heart John Jacob Jingelheimer
Smith) not to mention Spanish names which can have MANY parts.
Robert De Niro
Jesus Mary and Joseph
Surnames with origins in some languages don't start with a capital
Michael Henry de Young - Dutch
Wernher von Braun - German
Thanks for the suggestions. I'm going to play around with this.
On the most part, I'm doing detection for scenarios with two names, so
names like Robert De Niro will not come up.
Rick Denatale wrote:
I'm pretty sure, though that the actor would say he HAD two names, and
his first name was "Robert" and his last name was "De Niro"