ANR 2017 Project (ANR-17-CE23-0007-01)
Coordinator: Manuel Atencia (Univ. Grenoble Alpes, CNRS, Inria, LIG, F-38000 Grenoble, France)
The society at large requests access to available data from various bodies: governments, universities, cultural actors, etc. This has led to the release of a vast quantity of linked data, i.e., data expressed in semantic web formalisms (RDF). Part of the added value of linked data lies in the links identifying the same entity in different datasets. For instance, they may identify the same books and articles in different bibliographical data sources. Links allow to jointly exploit the content of data sources and make inferences between datasets. Thus, finding the manifestation of the same entity across several datasets is a crucial task for linked data.
One novel way to generate such links is to extract and use link keys. Link keys generalise database keys in two independent directions: they deal with data in RDF, and they apply across two datasets. The goal of ELKER is to extend the foundations and algorithms of link keys in two complementary ways: extracting link keys automatically from datasets and reasoning with link keys.
Concerning link key extraction, ELKER will delve into the parallel between link key extraction and formal concept analysis. This will allow to extend the type of link keys that can be extracted and to take advantage of optimised extraction procedures. We will also deal with dependent link keys naturally occurring when the classes in ontologies are interdependent. For that purpose, we will consider the procedures defined for relational concept analysis and adapt them to link keys. We will also develop a fixed point semantics for link keys that depend on each other, which would allow to generate more links. Finally, we will explore description building techniques for optimising extraction, i.e., taking advantage of the quality measures used for selecting link keys during the extraction process so that the search space can be reduced.
Regarding reasoning with link keys, ELKER will extend description logics techniques for reasoning with ontologies, data and link keys. Tableau methods for description logics will be adapted to infer axioms and link keys from ontologies and link keys. We will also consider the distribution of this reasoning process adapted to the case where ontologies and datasets cannot be centralised. Such techniques may be used off-line for generating new link keys that can be evaluated on data. For high-throughput link generation, we will transform link keys into Datalog rules in an adaptation of probabilistic Datalog allowing to carry uncertainty from link keys and axioms.
The theoretical outcomes of ELKER will be implemented and integrated in software maintained by the partners and connected together. They will be distributed as open source software. Moreover, the designed methods and tools will be evaluated through specifically designed benchmarks enabling to test the unique aspects of link keys, and on real-world datasets.
The ELKER consortium comprises three complementary teams specialist on data interlinking and semantic web technologies and models, formal concept analysis and reasoning in description logics.