Download PDFOpen PDF in browserGuided Inductive Logic Programming: Cleaning Knowledge Bases with Iterative User Feedback15 pages•Published: April 27, 2020AbstractDomain-oriented knowledge bases (KBs) such as DBpedia and YAGO are largely constructed by applying a set of predefined extraction rules to the semi-structured contents of Wikipedia articles. Although both of these large-scale KBs achieve very high average precision values (above 95% for YAGO3), subtle mistakes in a few of the underlying ex- traction rules may still impose a substantial amount of systematic extraction mistakes for specific relations. For example, by applying the same regular expressions to extract per- son names of both Asian and Western nationality, YAGO erroneously swaps most of the family and given names of Asian person entities. For traditional rule-learning approaches based on Inductive Logic Programming (ILP), it is very difficult to detect these systematic extraction mistakes, since they usually occur only in a relatively small subdomain of the relations’ arguments. In this paper, we thus propose a guided form of ILP, coined “GILP”, that iteratively asks for small amounts of user feedback over a given KB to learn a set of data-cleaning rules that (1) best match the feedback and (2) also generalize to a larger portion of facts in the KB. We propose both algorithms and respective metrics to automatically assess the quality of the learned rules with respect to the user feedback.Keyphrases: data cleaning, feedback, knowledge bases, rule learning In: Gregoire Danoy, Jun Pang and Geoff Sutcliffe (editors). GCAI 2020. 6th Global Conference on Artificial Intelligence (GCAI 2020), vol 72, pages 92-106.
|