Problem with validates uniqueness and accented characters

I have a table 'client' with a unique index on column 'nome' I have also set validates :uniqueness => true on column 'nome'

There is a record 'João Victor Santos' on my clients table. When I try to insert another 'João Victor Santos' Rails executes the following command to check if a record with this name already exists:

SELECT `clients`.`id` FROM `clients` WHERE (`clients`.`nome` = BINARY 'João Victor Santos') LIMIT 1

Unfortunately this query returns 0 rows and validation passes and I receive this stack trace

Mysql::Error: Duplicate entry 'João Victor Santos' for key 2

If I execute this query in phpMyAdmin it correctly returns 1 result

SELECT `clients`.`id` FROM `clients` WHERE (`clients`.`nome` = 'João Victor Santos') LIMIT 1

I'm using Rails 3.0.3, Ruby 1.8.7 and the mysql gem. I can't switch to mysql2 gem because I'm running on Windows and this application will also run on Windows in production

Has anyone faced this problem?

Victor Cisneiros wrote in post #969792:

I have a table 'client' with a unique index on column 'nome' I have also set validates :uniqueness => true on column 'nome'

There is a record 'João Victor Santos' on my clients table. When I try to insert another 'João Victor Santos' Rails executes the following command to check if a record with this name already exists:

SELECT `clients`.`id` FROM `clients` WHERE (`clients`.`nome` = BINARY 'João Victor Santos') LIMIT 1

Unfortunately this query returns 0 rows and validation passes and I receive this stack trace

Mysql::Error: Duplicate entry 'João Victor Santos' for key 2

If I execute this query in phpMyAdmin it correctly returns 1 result

SELECT `clients`.`id` FROM `clients` WHERE (`clients`.`nome` = 'João Victor Santos') LIMIT 1

Perhaps you've got an encoding issue.

I'm using Rails 3.0.3, Ruby 1.8.7 and the mysql gem. I can't switch to mysql2 gem because I'm running on Windows and this application will also run on Windows in production

You should absolutely not be running Rails applications on Windows in production. Find a way to host the app on *nix if at all possible.

Has anyone faced this problem?

Best,

I tried running on my linux VPS with Ruby 1.9.2 and the mysql2 gem and the problem still persists

The funny thing is that I have a view where you can filter all the clients and if I make it filter this particular client by running this query

SELECT `clients`.* FROM `clients` WHERE (nome LIKE '%João Victor Santos%') ORDER BY nome ASC LIMIT 30 OFFSET 0

It correctly returns 1 row

I'm starting to think the problem is with the BINARY in the validates uniqueness query, anyone knows a way to monkey-patch it to remove the BINARY keyword?

Please quote when replying.

Victor Cisneiros wrote in post #969879:

I tried running on my linux VPS with Ruby 1.9.2 and the mysql2 gem and the problem still persists

The funny thing is that I have a view where you can filter all the clients and if I make it filter this particular client by running this query

SELECT `clients`.* FROM `clients` WHERE (nome LIKE '%João Victor Santos%') ORDER BY nome ASC LIMIT 30 OFFSET 0

It correctly returns 1 row

I'm starting to think the problem is with the BINARY in the validates uniqueness query, anyone knows a way to monkey-patch it to remove the BINARY keyword?

That's probably not the issue. Did you investigate the encodings as I suggested in my earlier post?

Best,

That's probably not the issue. Did you investigate the encodings as I suggested in my earlier post?

I don't have any encoding issue except on this query

Anyway I found out that if I put validates :nome, :presence => true, :uniqueness => { :case_sensitive => false }

instead of validates :nome, :presence => true, :uniqueness => true

The query changes and doesn't use BINARY anymore and guess what, it now works

Here is the generated sql now

SELECT `clients`.`id` FROM `clients` WHERE (LOWER(`clients`.`nome`) = LOWER('João Victor Santos')) LIMIT 1

It could be/ BINARY forces the comparison to be done byte by byte, but (assuming the column is a utf8 one) the character ã can be represented with more than one sequence of bytes.

Fred

Frederick Cheung wrote in post #969886:

> SELECT `clients`.* FROM `clients` WHERE (nome LIKE '%João Victor > Santos%') ORDER BY nome ASC LIMIT 30 OFFSET 0

> It correctly returns 1 row

> I'm starting to think the problem is with the BINARY in the validates > uniqueness query, anyone knows a way to monkey-patch it to remove the > BINARY keyword?

That's probably not the issue. Did you investigate the encodings as I suggested in my earlier post?

It could be/ BINARY forces the comparison to be done byte by byte, but (assuming the column is a utf8 one) the character can be represented with more than one sequence of bytes.

Oh, that's a really good point.

Fred

Best,