RegExp help

Hi --

I need some help with a regular expression for a validates_format_of statement in my model. I have a user login field and i only want to allow the login field to have alphanumeric characters and the underscore ( a-z, A-Z, 1-9, _ ) those are the only characters I want to allow.

What is the properly Ruby RegExp to do this that I would use in the :with => // statement of the validates_format_of

The \w character class is all alphanumerics plus underscore -- and the \W character class is the opposite. Assuming you really don't want to allow zero, you could do:

   :with => /[^\W0]+/

i.e., no character (that's the ^) that is either \W or 0.

Note, however, that there's been some flux in the question of whether or not your regex gets automatically wrapped by beginning and end-of-string anchors. That regex assumes that the anchors are added (though I hope in the long run they aren't). Try some tests, and if you need to, you can wrap it in anchors like this:

   /\A[^\W0]+\z/

David

Hi --

While I'm a little boggled by David's answers, I think this is what you're looking for:

:with => /^[A-Za-z0-9_]+$/

I wrote:

While I'm a little boggled by David's answers

Should have been more specific here. I haven't seen \A and \z; I have always used ^ and $.

I think this is what you're looking for:

:with => /^[A-Za-z0-9_]+$/

More simply put:

:with => /^\w+$/

- Mark.

irb(main):006:0> "!@#\$%*(\nAAAAA" =~ /^\w+$/ => 8 irb(main):007:0> "!@#\$%*(\nAAAAA" =~ /\A\w+\Z/ => nil

^ and $ match beginning and end of line, \A and \Z match beginning and end of string. You want \A and \Z.

Hi --

Mark Thomas wrote:

I wrote:

While I'm a little boggled by David's answers

Should have been more specific here. I haven't seen \A and \z; I have always used ^ and $.

I think this is what you're looking for:

:with => /^[A-Za-z0-9_]+$/

More simply put:

:with => /^\w+$/

Thanks Mark. Both your's and David's answers seem to work, but I'm using use as it is more the style I'm used to seeing as well.

It's not a style matter; they do different things. ^ and $ anchor to beginning and end of a line, whereas \A and \z match beginning and end of string.

If you use ^ and $, you'll want to be absolutely certain that no one can ever submit a multi-line answer:

   puts "Match" if /^\w+$/.match("This is\nnot\nwhat you want!")    => Match

If you anchor to the beginning and end of the string:

   puts "Match" if /\A\w+$\z/.match("This is\nnot\nwhat you want!")    => nil

which is almost certainly better.

David

Hi --

Mark Thomas wrote:

I wrote:

While I'm a little boggled by David's answers

Should have been more specific here. I haven't seen \A and \z; I have always used ^ and $.

I think this is what you're looking for:

:with => /^[A-Za-z0-9_]+$/

More simply put:

:with => /^\w+$/

- Mark.

Thanks Mark. Both your's and David's answers seem to work, but I'm using use as it is more the style I'm used to seeing as well.

irb(main):006:0> "!@#\$%*(\nAAAAA" =~ /^\w+$/ => 8 irb(main):007:0> "!@#\$%*(\nAAAAA" =~ /\A\w+\Z/ => nil

^ and $ match beginning and end of line, \A and \Z match beginning and end of string. You want \A and \Z.

I'd go for \z, because \Z discounts a final newline:

   irb(main):005:0> /abc\z/.match("abc\n")    => nil    irb(main):006:0> /abc\Z/.match("abc\n")    => #<MatchData:0xb7eaf2d8>

Might as well close that loophole too :slight_smile:

David

> ^ and $ match beginning and end of line, \A and \Z match beginning and > end of string. You want \A and \Z.

I'd go for \z, because \Z discounts a final newline:

Thanks for the info. I must have missed the memo about Ruby regexes being different from Perl. Are there other differences and Is this documented anywhere?

Thanks.

- Mark.

Hi --

^ and $ match beginning and end of line, \A and \Z match beginning and end of string. You want \A and \Z.

I'd go for \z, because \Z discounts a final newline:

Thanks for the info. I must have missed the memo about Ruby regexes being different from Perl. Are there other differences and Is this documented anywhere?

I think the memo would have been if they were exactly the same as Perl's :slight_smile: The anchors should be documented in most or all extended discussions of Ruby regexes (though they may or may not mention how these compare to Perl). I've seen the second edition of the Friedl book but don't own it, and I don't remember how detailed it gets in its Ruby comparisons.

One area to focus on in the Perl/Ruby comparison is the modifiers. Since Ruby has anchors for both line and string, it doesn't need the /m modifier as it's defined in Perl. Ruby's /m modifier is like Perl's /s: it adds newline to the . character class.

David

FYI, Perl has \A, \Z, and \z, too. In Perl, the meaning of ^ and $ change with the use of the /m modifier and that's why it's common to see /ms or /xms on Perl regexps. With Ruby, I'd expect to see /m or /xm on most complex patterns.

I was surprised as how hard it was to find the modifiers in Ruby listed in the Pickaxe, but they're in chapter 22 ("The Ruby Language") starting on page 324.

The other significant way that the Perl and Ruby (1.8) regexps differ is in the semantics of executing code during the match. Perl allow code in the replacement text with the /e modifier on a substitution where Ruby just passes the match off to a block.

-Rob

Rob Biedenharn http://agileconsultingllc.com Rob@AgileConsultingLLC.com

Hi --

OK, so we have

Ruby /\Axyz\z/ is the same as Perl /^xyz$/, Ruby /^xyz$/ is the same as Perl /^xyz$/m, Ruby /^xyz$/m is the same as Perl /^xyz$/ms,

is this correct?

I think you've got it. Here are some examples of perl and ruby with some similar regexps to demonstrate.

$ perl -e '$string = "uvw\nxyz\nABC"; if ($string =~ /^xyz$/) { print "match\n" } else { print "nope\n" }' nope

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /\Axyz\z/) then print "match\n" else print "nope\n" end' nope

$ perl -e '$string = "uvw\nxyz\nABC"; if ($string =~ /^xyz$/m) { print "match\n" } else { print "nope\n" }' match

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /^xyz$/) then print "match\n" else print "nope\n" end' match

$ perl -e '$string = "uvw\nxyz\nABC"; if ($string =~ /^xyz....$/m) { print "match\n" } else { print "nope\n" }' nope

$ perl -e '$string = "uvw\nxyz\nABC"; if ($string =~ /^xyz....$/ms) { print "match\n" } else { print "nope\n" }' match

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /^xyz....$/m) then print "match\n" else print "nope\n" end' match

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /^xyz....\z/m) then print "match\n" else print "nope\n" end' match

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /\A....xyz....\z/m) then print "match\n" else print "nope\n" end' match

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /\A....xyz....\z/) then print "match\n" else print "nope\n" end' nope

$ ruby -e 'string = "uvw\nxyz\nABC"; if (string =~ /^xyz....\z/) then print "match\n" else print "nope\n" end' nope

Rob Biedenharn http://agileconsultingllc.com Rob@AgileConsultingLLC.com