Problem with validates uniqueness and accented characters

I have a table 'client' with a unique index on column 'nome'
I have also set validates :uniqueness => true on column 'nome'

There is a record 'João Victor Santos' on my clients table. When I try
to insert another 'João Victor Santos' Rails executes the following
command to check if a record with this name already exists:

SELECT `clients`.`id` FROM `clients` WHERE (`clients`.`nome` = BINARY
'João Victor Santos') LIMIT 1

Unfortunately this query returns 0 rows and validation passes and I
receive this stack trace

Mysql::Error: Duplicate entry 'João Victor Santos' for key 2

If I execute this query in phpMyAdmin it correctly returns 1 result

SELECT `clients`.`id` FROM `clients` WHERE (`clients`.`nome` = 'João
Victor Santos') LIMIT 1

I'm using Rails 3.0.3, Ruby 1.8.7 and the mysql gem. I can't switch to
mysql2 gem because I'm running on Windows and this application will also
run on Windows in production

Has anyone faced this problem?

Victor Cisneiros wrote in post #969792:

I have a table 'client' with a unique index on column 'nome'
I have also set validates :uniqueness => true on column 'nome'

There is a record 'João Victor Santos' on my clients table. When I try
to insert another 'João Victor Santos' Rails executes the following
command to check if a record with this name already exists:

SELECT `clients`.`id` FROM `clients` WHERE (`clients`.`nome` = BINARY
'João Victor Santos') LIMIT 1

Unfortunately this query returns 0 rows and validation passes and I
receive this stack trace

Mysql::Error: Duplicate entry 'João Victor Santos' for key 2

If I execute this query in phpMyAdmin it correctly returns 1 result

SELECT `clients`.`id` FROM `clients` WHERE (`clients`.`nome` = 'João
Victor Santos') LIMIT 1

Perhaps you've got an encoding issue.

I'm using Rails 3.0.3, Ruby 1.8.7 and the mysql gem. I can't switch to
mysql2 gem because I'm running on Windows and this application will also
run on Windows in production

You should absolutely not be running Rails applications on Windows in
production. Find a way to host the app on *nix if at all possible.

Has anyone faced this problem?

Best,

I tried running on my linux VPS with Ruby 1.9.2 and the mysql2 gem and
the problem still persists

The funny thing is that I have a view where you can filter all the
clients and if I make it filter this particular client by running this
query

SELECT `clients`.* FROM `clients` WHERE (nome LIKE '%João Victor
Santos%') ORDER BY nome ASC LIMIT 30 OFFSET 0

It correctly returns 1 row

I'm starting to think the problem is with the BINARY in the validates
uniqueness query, anyone knows a way to monkey-patch it to remove the
BINARY keyword?

Please quote when replying.

Victor Cisneiros wrote in post #969879:

I tried running on my linux VPS with Ruby 1.9.2 and the mysql2 gem and
the problem still persists

The funny thing is that I have a view where you can filter all the
clients and if I make it filter this particular client by running this
query

SELECT `clients`.* FROM `clients` WHERE (nome LIKE '%João Victor
Santos%') ORDER BY nome ASC LIMIT 30 OFFSET 0

It correctly returns 1 row

I'm starting to think the problem is with the BINARY in the validates
uniqueness query, anyone knows a way to monkey-patch it to remove the
BINARY keyword?

That's probably not the issue. Did you investigate the encodings as I
suggested in my earlier post?

Best,

That's probably not the issue. Did you investigate the encodings as I
suggested in my earlier post?

I don't have any encoding issue except on this query

Anyway I found out that if I put
validates :nome, :presence => true, :uniqueness => { :case_sensitive =>
false }

instead of
validates :nome, :presence => true, :uniqueness => true

The query changes and doesn't use BINARY anymore and guess what, it now
works

Here is the generated sql now

SELECT `clients`.`id` FROM `clients` WHERE (LOWER(`clients`.`nome`) =
LOWER('João Victor Santos')) LIMIT 1

It could be/ BINARY forces the comparison to be done byte by byte, but
(assuming the column is a utf8 one) the character ã can be represented
with more than one sequence of bytes.

Fred

Frederick Cheung wrote in post #969886:

> SELECT `clients`.* FROM `clients` WHERE (nome LIKE '%João Victor
> Santos%') ORDER BY nome ASC LIMIT 30 OFFSET 0

> It correctly returns 1 row

> I'm starting to think the problem is with the BINARY in the validates
> uniqueness query, anyone knows a way to monkey-patch it to remove the
> BINARY keyword?

That's probably not the issue. Did you investigate the encodings as I
suggested in my earlier post?

It could be/ BINARY forces the comparison to be done byte by byte, but
(assuming the column is a utf8 one) the character can be represented
with more than one sequence of bytes.

Oh, that's a really good point.

Fred

Best,