Hello - I am interested in parsing arbitrary street addresses from strings (semi-clean voter lists, mainly). These data may show up in various formats, but there are several common patterns. Non-exhaustive examples:
12-123 Washington Ave Minneapolis MN 12345 12/A-123 Washington Hwy Minneapolis Minnesota 12 Washington Dr Minneapolis Minn 12345 12 Washington Ridge St ... 12/AB-123 Washington Blvd ... 12/A-123 Washington Pl ... #12-123 Washington Rd E ... 1234/A Washington Ave ... 12B-123-A Washington St ... etc...
My question is this: before I start cooking up a complex regexp to parse these strings into standard pieces(like state, city, street name, street type, unit number, etc), has someone already done this? Or is there some kind of toolkit to assist the parsing of street addresses? Surely this is a very common problem and it must have been solved many times by now. Or perhaps this type of data is so irregular as to preclude syntactical analysis?