dump and import MySQL table w/ accents

I am switching to a composite primary key (string and user ID) from the Rails
conventional auto-incrementing integer primary ID. The table is large (2.5
million records) and I'd rather not discard the contents. The
composite_primary_key gem doesn't appear to support altering the table with a
migration to do its magic, only creating a table from scratch. So I dumped
the table with mysqldump, ran the migration (table looks good), and am trying
to repopulate the table. It has accented characters and is complaining about
duplicates, apparently around words with and without accents, e.g., 'jose' and
'josé'. I've been deleting one by hand from the dump, but it is tedious and
very slow. Emacs crawls when dealing with very large files with very long
lines.

I just don't understand why the accents are causing problems. The string
column is utf8_general_ci collation, just like other fields in the database
with strings with accents. What do I need to specify so it will import the
dump? Is there a problem with strings with accents in composite indexes?

The table is created with a Rails migration, but everything else is pure MySQL
utilities.

TIA,
  Jeffrey

Ensure your database.yml file has a line like:

encoding: utf8

Quoting BenH <benhami@gmail.com>:

Ensure your database.yml file has a line like:

encoding: utf8

Thank you. It has for over a year. The data dumped was UTF-8, but something
is preventing it from being re-imported.

Sigh,
  Jeffrey

Quoting Jeffrey L. Taylor <ror@abluz.dyndns.org>:

I am switching to a composite primary key (string and user ID) from the Rails
conventional auto-incrementing integer primary ID. The table is large (2.5
million records) and I'd rather not discard the contents. The
composite_primary_key gem doesn't appear to support altering the table with a
migration to do its magic, only creating a table from scratch. So I dumped
the table with mysqldump, ran the migration (table looks good), and am trying
to repopulate the table. It has accented characters and is complaining about
duplicates, apparently around words with and without accents, e.g., 'jose' and
'josé'. I've been deleting one by hand from the dump, but it is tedious and
very slow. Emacs crawls when dealing with very large files with very long
lines.

I just don't understand why the accents are causing problems. The string
column is utf8_general_ci collation, just like other fields in the database
with strings with accents. What do I need to specify so it will import the
dump? Is there a problem with strings with accents in composite indexes?

The answer is mixed and bits of it don't make sense. If I work with the MySQL
client (mysql), I can create records with primary keys identical, except for
an accent. Doing the same thing in the Rails console throws an DuplicateKey
exception. Using mysqldump to dump the contents of a table, changing the
table so the primary key is a integer-string composite and trying to
repopulate the table from the mysqldump output also barfs on a duplicate key
exception.

It is non-optimal, but I can live for the moment with a table with two
non-unique indexes.

Jeffrey

Quoting Wisccal Wisccal <rails-mailing-list@andreas-s.net>:

Wisccal Wisccal wrote:
> Jeffrey L. Taylor wrote:
>
> As far as I understand, utf8_general_ci is case-insensitive.

I meant to say "accent-insensitive"...

Thank you. I didn't think MySQL would messup that badly. I'll try this out
this week. I just checked and my development and production machines have
different collation, which explains the inconsistent results I was seeing in
my tests.

Jeffrey