Urm, okay, call me stupid Ishmael, but, why not merely subclass the current htmlparser and then whenever you get a 'bad tag' do whatever you want to do with it. I dare say that if someone passes me a badly formed document, I -want- them to see an error, however whatever -you- decide to do with it is upto (well) -you-. If you want to try and 'fix' certain errors in a bad document, thats surely down to 'you'
You may get lucky and someone may have already trod this path, but, surely in the case of 'bad data' your not best placed to say whats 'valid' and whats not. surely thats something only the originating user can do. Mean to say, you can deal with things like a missing '>' fairly simply, but what about character transposition ? inptu instead of input, or character addition <input name="freds"> instead of <input name="fred"> ..
I think the -saniest- thing a parser can do, is raise an error on badly formed. Perhaps not the answer you want, and I look forward to being proved 'wrong' but, well, *polite shrug* there's my 2c ;p