What's a good way to temporarily store invalid records for validating when importing data?

I’m currently trying to setup daily processes that will import information into my database. I have many records that fail validations, and for the most part I have methods that attempt to auto-correct them and re-save, but there are some more complex scenarios that require a user to correct some information before they’re valid. I’m trying to come up with a good way in which I can set this up and I’ve come up with two methods, but I’m not sure which I should really go with…

Scenario 1:

I figure I could import into my models like I normally do, and when they don’t pass validations then I will go ahead and save them to the database anyways and then flag them with a boolean column as “invalid”. Then I’ll setup an action and view to allow the user to see and correct all the records that have been flagged.

Pros: The pro to this method is I don’t have to create various new models/tables for each model that I’m importing into for temporary storage.

Cons: I’m still storing bad data in the table, and also would have to alter many queries to filter out invalid records based on the boolean column I would add to the model.

Scenario 2:

I create new models that mimic the model that I want to import the records into… but then I’m adding alot of logic that is shared between the two – not to mention I have a lot of additional files and code. However, this method wouldn’t have me storing invalid data in my tables.

Thoughts?

Generally speaking i think you should NOT just allow invalid records AT ALL. For the record to be saved the user MUST ENSURE that the relevant information are valid.

Yes, I agree I just didn’t want to have to create an additional model for each of these models I would be importing and everything that would go with that. I think what would likely work is I can create a table in between my application table and the table I’m importing data from. On that table I could add a column that I can note records that are invalid. Then I can create a new action on my controller that reads those to a view in which I can them send them to the model’s _form.html.erb for the user to correct and attempt to save.

Avoid BY ALL MEANS any procedure that might complicate maintenance for you both short term and long term. I would advice you that you reconsider the approach. Rails has good validation methods that will make your programming life sweet and fun. What you are doing doesn’t seem like fun to me. With Rails you can make tables to validated each other also perform custom validations etc. However when you say import are you importing records from an old built data-source?

You don’t need to create another table also. You just need a custom validation method that handles the checking of the imported data. And in a before_save this can be implemented. Also If there is an error send a notification to the owner of the record via email or when next the user tries to access the application redirect the user to make corrections before they proceed. **Scenario 1 **is seem good but don’t play with invalid data in your database.

I get my data from a database that I have access to that is updated at different points during the day. As far as having a custom validation method, I have that. Here’s a, hopefully, more descriptive example of what is going on.

I have my Rails app, and I import records into that database on a daily basis from a database I have no control over. During importing of those records some pass my validations and some do not. I have some methods in place that will attempt to correct the column’s data that is failing my validation, but sometimes they can’t be corrected unless they have a user manually look into it and correct that data. I’m attempting to come up with a solution that allows me to get a list of those records that failed validation to allow a user to use my app to change the problem information and then attempt to save the record to the database.

Since you don’t have control over the originating data-source…like i said Scenario 1 would work just fine.

Thanks!

Scenario 3: create a another database. dump the invalids there.

kind regards --botp

Can you access the other database 'live'? If so then presumably you have some method of knowing which records in the external database have been successfully imported. Again if so then extend that method to include an indication of records that you attempted to import but which failed. Then in your app you can use that to fetch the data from the failed external records again and allow the user to correct it.

If you cannot access the external db live then my preferred option would be to go for an intermediate table. The import operation would import into that table and a second phase would move records from that into the final table. Any records which could not be moved would be left in the intermediate table for later correction.

Colin

I do have access to it “live” in a sense. It gets replicated down to a server I have access to at different times of the day. I don’t currently have a way of knowing which records have changed so I’m going to have to do some comparisons to see what, if anything, has changed for each record. I could create the boolean column on that table though and if it fails validation I’ll just flag the record. This sounds like a good plan!

Hi David,

I ran into this situation as well, and went with another solution. I don’t want bad data in my new database, and the import process presents an opportunity to clean up the source database. So what I did is collect the records that fail validation in a Hash, then at the end of the import process, send the Hash to a Mailer class. The StaffMailer class sends an email to the staff who maintain the source database, and has a summary of records with missing/bad data, asking them to fix it. This way, the old (source) database gets cleaned up, and the new one never gets polluted with invalid records.

Cheers,

Brent