Is apostrophe (') something special in a regex if at end?

(Ruby 1.9.2) I have a simple validation regex which I need to pass the following values: “Billy-Bob” and “O’Kelley” (as test cases). Originally I was not allowing apostrophe but it became apparent I had to allow it.

The initial regex was:

/[1]*$/

Now, when I added the apostrophe like this:

/[2]*$/

Then for some reason “Billy-Bob” was not getting matched:

“Billy-Bob” =~ /[3]*$/

nil “O’Kelley” =~ /[4]*$/

0

But when I moved the apostrophe further in, then things work as desired an expected:

“Billy-Bob” =~ /[5]$/ 0 “O’Kelley” =~ /[6]$/

0

Why is this?


  1. a-zA-Z - ↩︎

  2. a-zA-Z’ -’ ↩︎

  3. a-zA-Z -’ ↩︎

  4. a-zA-Z’ -’ ↩︎

  5. a-zA-Z’ - ↩︎

  6. a-zA-Z’ - ↩︎

(Ruby 1.9.2) I have a simple validation regex which I need to pass the following values: "Billy-Bob" and "O'Kelley" (as test cases). Originally I was not allowing apostrophe but it became apparent I had to allow it.

The initial regex was:

/^[a-zA-Z -]*$/

Now, when I added the apostrophe like this:

/^[a-zA-Z' -']*$/

Then for some reason "Billy-Bob" was not getting matched:

"Billy-Bob" =~ /^[a-zA-Z -']*$/ nil "O'Kelley" =~ /^[a-zA-Z' -']*$/ 0

But when I moved the apostrophe further in, then things work as desired an expected:

"Billy-Bob" =~ /^[a-zA-Z' -]*$/ 0 "O'Kelley" =~ /^[a-zA-Z' -]*$/ 0

Why is this?

It is the minus that is a special char (as in a-z) if you escape the minus it is ok. ruby-1.9.2-p0 > "Billy-Bob" =~ /^[a-zA-Z \-']*$/ => 0 ruby-1.9.2-p0 > "O'Kelley" =~ /^[a-zA-Z \-']*$/ => 0

Colin

(Ruby 1.9.2) I have a simple validation regex which I need to pass the

following values: “Billy-Bob” and “O’Kelley” (as test cases). Originally I

was not allowing apostrophe but it became apparent I had to allow it.

The initial regex was:

/[1]*$/

Now, when I added the apostrophe like this:

/[2]*$/

Then for some reason “Billy-Bob” was not getting matched:

“Billy-Bob” =~ /[3]*$/

nil

“O’Kelley” =~ /[4]*$/

0

But when I moved the apostrophe further in, then things work as desired an

expected:

“Billy-Bob” =~ /[5]*$/

0

“O’Kelley” =~ /[6]*$/

0

Why is this?

It is the minus that is a special char (as in a-z) if you escape the

minus it is ok.

Yes I see you are right —the weird part is that the minus is getting passed not as a special character in the first examples ( “Billy-Bob” =~ /[7]*$/ returns 0!) … anyhow, I will remember than and start escaping it.


  1. a-zA-Z - ↩︎

  2. a-zA-Z’ -’ ↩︎

  3. a-zA-Z -’ ↩︎

  4. a-zA-Z’ -’ ↩︎

  5. a-zA-Z’ - ↩︎

  6. a-zA-Z’ - ↩︎

  7. a-zA-Z’ - ↩︎

Yes I wondered about that. Either it is a bug or a documented feature that - does not need to be escaped in some circumstances. Or perhaps [a-zA-Z -] means something that neither of us understands.

Colin

The minus (hyphen) in a charset is un-special if it is at the beginning or the end. You’re better off escaping it yourself for exactly the reason you encountered – adding another character to the end changed the meaning of the regular expression (charset) in a way you didn’t expect.

-Rob

Rob Biedenharn

Rob@AgileConsultingLLC.com http://AgileConsultingLLC.com/

rab@GaslightSoftware.com http://GaslightSoftware.com/

It is the minus that is a special char (as in a-z) if you escape the minus it is ok.

Yes I see you are right —the weird part is that the minus is getting passed not as a special character in the first examples ( “Billy-Bob” =~ /[1]*$/ returns 0!) … anyhow, I will remember than and start escaping it.

The minus (hyphen) in a charset is un-special if it is at the beginning or the end. You’re better off escaping it yourself for exactly the reason you encountered – adding another character to the end changed the meaning of the regular expression (charset) in a way you didn’t expect.

Makes sense, thanks


  1. a-zA-Z’ - ↩︎