RegExp help

David_A_Black1 · December 12, 2006, 6:32pm

Hi --

I need some help with a regular expression for a validates_format_of statement in my model. I have a user login field and i only want to allow the login field to have alphanumeric characters and the underscore ( a-z, A-Z, 1-9, _ ) those are the only characters I want to allow.

What is the properly Ruby RegExp to do this that I would use in the :with => // statement of the validates_format_of

The \w character class is all alphanumerics plus underscore -- and the \W character class is the opposite. Assuming you really don't want to allow zero, you could do:

:with => /[^\W0]+/

i.e., no character (that's the ^) that is either \W or 0.

Note, however, that there's been some flux in the question of whether or not your regex gets automatically wrapped by beginning and end-of-string anchors. That regex assumes that the anchors are added (though I hope in the long run they aren't). Try some tests, and if you need to, you can wrap it in anchors like this:

/\A[^\W0]+\z/

David

David_A_Black1 · December 12, 2006, 7:24pm

Hi --

Mark_Thomas1 · December 12, 2006, 8:14pm

While I'm a little boggled by David's answers, I think this is what you're looking for:

:with => /^[A-Za-z0-9_]+$/

Mark_Thomas1 · December 12, 2006, 8:20pm

I wrote:

While I'm a little boggled by David's answers

Should have been more specific here. I haven't seen \A and \z; I have always used ^ and $.

I think this is what you're looking for:

:with => /^[A-Za-z0-9_]+$/

More simply put:

:with => /^\w+$/

- Mark.

Jeremy_Evans · December 12, 2006, 9:05pm

irb(main):006:0> "!@#\$%*(\nAAAAA" =~ /^\w+$/ => 8 irb(main):007:0> "!@#\$%*(\nAAAAA" =~ /\A\w+\Z/ => nil

^ and $ match beginning and end of line, \A and \Z match beginning and end of string. You want \A and \Z.

David_A_Black1 · December 12, 2006, 9:08pm

Hi --

Mark Thomas wrote:

I wrote:

While I'm a little boggled by David's answers

Should have been more specific here. I haven't seen \A and \z; I have always used ^ and $.

I think this is what you're looking for:

:with => /^[A-Za-z0-9_]+$/

More simply put:

:with => /^\w+$/

Thanks Mark. Both your's and David's answers seem to work, but I'm using use as it is more the style I'm used to seeing as well.

It's not a style matter; they do different things. ^ and $ anchor to beginning and end of a line, whereas \A and \z match beginning and end of string.

If you use ^ and $, you'll want to be absolutely certain that no one can ever submit a multi-line answer:

puts "Match" if /^\w+$/.match("This is\nnot\nwhat you want!") => Match

If you anchor to the beginning and end of the string:

puts "Match" if /\A\w+$\z/.match("This is\nnot\nwhat you want!") => nil

which is almost certainly better.

David

David_A_Black1 · December 12, 2006, 9:09pm

Hi --

Mark Thomas wrote:

I wrote:

While I'm a little boggled by David's answers

Should have been more specific here. I haven't seen \A and \z; I have always used ^ and $.

I think this is what you're looking for:

:with => /^[A-Za-z0-9_]+$/

More simply put:

:with => /^\w+$/

- Mark.

Thanks Mark. Both your's and David's answers seem to work, but I'm using use as it is more the style I'm used to seeing as well.

irb(main):006:0> "!@#\$%*(\nAAAAA" =~ /^\w+$/ => 8 irb(main):007:0> "!@#\$%*(\nAAAAA" =~ /\A\w+\Z/ => nil

^ and $ match beginning and end of line, \A and \Z match beginning and end of string. You want \A and \Z.

I'd go for \z, because \Z discounts a final newline:

irb(main):005:0> /abc\z/.match("abc\n") => nil irb(main):006:0> /abc\Z/.match("abc\n") => #<MatchData:0xb7eaf2d8>

Might as well close that loophole too

David

Mark_Thomas1 · December 13, 2006, 7:37pm

> ^ and $ match beginning and end of line, \A and \Z match beginning and > end of string. You want \A and \Z.

I'd go for \z, because \Z discounts a final newline:

Thanks for the info. I must have missed the memo about Ruby regexes being different from Perl. Are there other differences and Is this documented anywhere?

Thanks.

- Mark.

David_A_Black1 · December 13, 2006, 8:13pm

Hi --

^ and $ match beginning and end of line, \A and \Z match beginning and end of string. You want \A and \Z.

I'd go for \z, because \Z discounts a final newline:

Thanks for the info. I must have missed the memo about Ruby regexes being different from Perl. Are there other differences and Is this documented anywhere?

I think the memo would have been if they were exactly the same as Perl's The anchors should be documented in most or all extended discussions of Ruby regexes (though they may or may not mention how these compare to Perl). I've seen the second edition of the Friedl book but don't own it, and I don't remember how detailed it gets in its Ruby comparisons.

One area to focus on in the Perl/Ruby comparison is the modifiers. Since Ruby has anchors for both line and string, it doesn't need the /m modifier as it's defined in Perl. Ruby's /m modifier is like Perl's /s: it adds newline to the . character class.

David

rab · December 13, 2006, 9:27pm

FYI, Perl has \A, \Z, and \z, too. In Perl, the meaning of ^ and $ change with the use of the /m modifier and that's why it's common to see /ms or /xms on Perl regexps. With Ruby, I'd expect to see /m or /xm on most complex patterns.

I was surprised as how hard it was to find the modifiers in Ruby listed in the Pickaxe, but they're in chapter 22 ("The Ruby Language") starting on page 324.

The other significant way that the Perl and Ruby (1.8) regexps differ is in the semantics of executing code during the match. Perl allow code in the replacement text with the /e modifier on a substitution where Ruby just passes the match off to a block.

-Rob

Rob Biedenharn http://agileconsultingllc.com Rob@AgileConsultingLLC.com

David_A_Black1 · December 13, 2006, 10:27pm

Hi --

Mark_Thomas1 · December 14, 2006, 5:52pm

OK, so we have

Ruby /\Axyz\z/ is the same as Perl /^xyz$/, Ruby /^xyz$/ is the same as Perl /^xyz$/m, Ruby /^xyz$/m is the same as Perl /^xyz$/ms,

is this correct?

rab · December 14, 2006, 10:37pm

I think you've got it. Here are some examples of perl and ruby with some similar regexps to demonstrate.

$ perl -e '$string = "uvw\nxyz\nABC"; if ($string =~ /^xyz$/) { print "match\n" } else { print "nope\n" }' nope

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /\Axyz\z/) then print "match\n" else print "nope\n" end' nope

$ perl -e '$string = "uvw\nxyz\nABC"; if ($string =~ /^xyz$/m) { print "match\n" } else { print "nope\n" }' match

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /^xyz$/) then print "match\n" else print "nope\n" end' match

$ perl -e '$string = "uvw\nxyz\nABC"; if ($string =~ /^xyz....$/m) { print "match\n" } else { print "nope\n" }' nope

$ perl -e '$string = "uvw\nxyz\nABC"; if ($string =~ /^xyz....$/ms) { print "match\n" } else { print "nope\n" }' match

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /^xyz....$/m) then print "match\n" else print "nope\n" end' match

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /^xyz....\z/m) then print "match\n" else print "nope\n" end' match

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /\A....xyz....\z/m) then print "match\n" else print "nope\n" end' match

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /\A....xyz....\z/) then print "match\n" else print "nope\n" end' nope

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /^xyz....\z/) then print "match\n" else print "nope\n" end' nope

Rob Biedenharn http://agileconsultingllc.com Rob@AgileConsultingLLC.com

Topic		Replies	Views
validates_format_of rubyonrails-talk	1	99	April 22, 2007
Regular expression: How do I allow forward slashes? rubyonrails-talk	9	236	December 21, 2009
Question about validates_format_of rubyonrails-talk	2	144	September 26, 2006
validates_format_of :something, :with => /REGEX/ rubyonrails-talk	1	122	April 8, 2011
Help on Regular EXpression rubyonrails-talk	2	116	June 13, 2009

RegExp help

Related topics

More Resources