Parsing Street Addresses

Hello - I am interested in parsing arbitrary street addresses from strings (semi-clean voter lists, mainly). These data may show up in various formats, but there are several common patterns. Non-exhaustive examples:

12-123 Washington Ave Minneapolis MN 12345 12/A-123 Washington Hwy Minneapolis Minnesota 12 Washington Dr Minneapolis Minn 12345 12 Washington Ridge St ... 12/AB-123 Washington Blvd ... 12/A-123 Washington Pl ... #12-123 Washington Rd E ... 1234/A Washington Ave ... 12B-123-A Washington St ... etc...

My question is this: before I start cooking up a complex regexp to parse these strings into standard pieces(like state, city, street name, street type, unit number, etc), has someone already done this? Or is there some kind of toolkit to assist the parsing of street addresses? Surely this is a very common problem and it must have been solved many times by now. Or perhaps this type of data is so irregular as to preclude syntactical analysis?


why not just split the strings with space?

The USPS might be able to help somewhat:


I did some more searching and found a very useful website:

They store 100's of standard regular expressions for a variety of purposes. I searched on "address", "postal", etc. and found some patterns to start with. I think this site could be useful for anyone needing to get quickly started on regexp parsing.