dump and import MySQL table w/ accents

I am switching to a composite primary key (string and user ID) from the Rails conventional auto-incrementing integer primary ID. The table is large (2.5 million records) and I'd rather not discard the contents. The composite_primary_key gem doesn't appear to support altering the table with a migration to do its magic, only creating a table from scratch. So I dumped the table with mysqldump, ran the migration (table looks good), and am trying to repopulate the table. It has accented characters and is complaining about duplicates, apparently around words with and without accents, e.g., 'jose' and 'josé'. I've been deleting one by hand from the dump, but it is tedious and very slow. Emacs crawls when dealing with very large files with very long lines.

I just don't understand why the accents are causing problems. The string column is utf8_general_ci collation, just like other fields in the database with strings with accents. What do I need to specify so it will import the dump? Is there a problem with strings with accents in composite indexes?

The table is created with a Rails migration, but everything else is pure MySQL utilities.

TIA,   Jeffrey

Ensure your database.yml file has a line like:

encoding: utf8

Quoting BenH <benhami@gmail.com>:

Ensure your database.yml file has a line like:

encoding: utf8

Thank you. It has for over a year. The data dumped was UTF-8, but something is preventing it from being re-imported.

Sigh,   Jeffrey

Quoting Jeffrey L. Taylor <ror@abluz.dyndns.org>:

I am switching to a composite primary key (string and user ID) from the Rails conventional auto-incrementing integer primary ID. The table is large (2.5 million records) and I'd rather not discard the contents. The composite_primary_key gem doesn't appear to support altering the table with a migration to do its magic, only creating a table from scratch. So I dumped the table with mysqldump, ran the migration (table looks good), and am trying to repopulate the table. It has accented characters and is complaining about duplicates, apparently around words with and without accents, e.g., 'jose' and 'josé'. I've been deleting one by hand from the dump, but it is tedious and very slow. Emacs crawls when dealing with very large files with very long lines.

I just don't understand why the accents are causing problems. The string column is utf8_general_ci collation, just like other fields in the database with strings with accents. What do I need to specify so it will import the dump? Is there a problem with strings with accents in composite indexes?

The answer is mixed and bits of it don't make sense. If I work with the MySQL client (mysql), I can create records with primary keys identical, except for an accent. Doing the same thing in the Rails console throws an DuplicateKey exception. Using mysqldump to dump the contents of a table, changing the table so the primary key is a integer-string composite and trying to repopulate the table from the mysqldump output also barfs on a duplicate key exception.

It is non-optimal, but I can live for the moment with a table with two non-unique indexes.

Jeffrey

Quoting Wisccal Wisccal <rails-mailing-list@andreas-s.net>:

Wisccal Wisccal wrote: > Jeffrey L. Taylor wrote: > > As far as I understand, utf8_general_ci is case-insensitive.

I meant to say "accent-insensitive"...

Thank you. I didn't think MySQL would messup that badly. I'll try this out this week. I just checked and my development and production machines have different collation, which explains the inconsistent results I was seeing in my tests.

Jeffrey