Rajdeep Sengupta
Blake Marsh
Jacob Dice

Address Validation Techniques with GIS Data

It is often difficult to track entities by address over time due to reporting inconsistencies. We propose a two-step process to clean and verify address data using Python-based tools. First, we standardize address components and convert the address to a coordinate set using open source APIs. Second, using scikit-learn, we build clusters of points within a given distance tolerance around each coordinate set to compensate for variations over time and identify same entities. We apply these methods to the Summary of Deposits data, a database of bank branch locations commonly used in banking research. The process improves identification of branch openings, closings, and relocations over time, which is of interest to policymakers and researchers.