Download PDFOpen PDF in browserA Combined Semantic Search and Machine Learning Approach for Address Entity ResolutionEasyChair Preprint 8327 pages•Date: March 15, 2019AbstractWe have developed a comprehensive prototype solution for a specific use case involving entity resolution for mailing addresses of financial institutions. Our objective was to find matches between user entry of misspelled or inaccurate addresses of business entities and their corresponding entries in a “gold copy” of complete and accurate mailing addresses (dictionary). Three distinct matching methods (PySolr, SoDA and Record Linkage) were used for a preliminary, yet diverse scheme of lookups in finding matches. These lookup processes may optionally be followed by search via a hybrid machine learning (ML) model via regularized logistic regression and hierarchical clustering using Dedupe. Our experimental results of elapsed times for searches using the three lookup methods on a variety of match types suggest that majority of the simpler matches are detected extremely fast (elapsed times: ~ 6 – 48 milliseconds) at the lookup stage, making it suitable for detecting simple and possibly most common errors in user entries for mailing addresses. The performance of ML models, on the other hand, is comparatively slower (elapsed times: ~ 174 – 201 milliseconds). Nevertheless, the hybrid ML model seems most suitable in cases where multiple ambiguities exist in user entry of addresses, and, as a result, the preliminary lookup methods may fail to detect possible matches. The precision and recall of the ML model on a sizeable test dataset are 0.89 and 0.94, respectively. These high scores on model performance suggest that the ML models can be applied successfully to entity resolution of mailing addresses. Our combined solution can be integrated with any enterprise software applications in order to provide both efficient and robust address matching service in cases where users enter mailing addresses as free-form texts that may carry inaccuracies. Keyphrases: Entity Resolution, Natural Language Processing, address entity resolution, entity resolution problem, gap distance, machine learning, matching method, semantic search
|