Advice on restructuring

Ive inherited a table called contacts which is used throughout the site. There are a lot of duplicates in this table we want to remove(i.e. the email field was not set to unique so multiple accounts were created etc etc).

The plan is to select the contact we want to keep and then select the ones we want to remove and transfer the needed data over before deleting the unwanted records. The trouble with this is that there a lots of associated tables, complexed self referential joins etc that the contacts table relates to. If its not done correctly then further down the line there could be serious issues as data may be lost etc...

A thought of mine was to simply merge them but instead of deleting the record we would point it to the main record we are keeping and then set the records we dont need to "inactive" or something so we know they are no longer used and point to an active record.

I can think of how to set it this up in the table but can anyone think of a super cool way to make sure the inactive record always returns the active record it points to when ever it its referenced or needed?

Any advice appreciated,


I don’t think you’re realistically going to avoid having a huge headache if you don’t do the migration correctly the first time. And if you try to hack in some automatic record reconciliation logic into your system, you will forever pay the price of that in terms of application complexity, performance, inflexibility, and unmaintainability moving forward.

My recommendation is to make extra sure your migration logic from the old schema/table to the new one is 100% correct, then backup your DB, do your migration, and live a happy new life. Get your data clean, don’t hamper yourself trying to sit on the fence between fixing the data and leaving it in a broken state.

Also, I’m not aware of any technology (Rails or DB-level) which will let you somehow automatically alias old rows to new rows when doing joins. Joins are not that smart. You’d probably need to write custom SQL which gets the initial results, then checks them for bad rows, then patches in correct data where needed, etc. It’d be a mess!

my 2c