Data journalism refers to investigation journalism made with data coming from varied information sources: public databases, such as Eurostat or nosdeputes.fr; public knowledge bases, such as Yago or Geonames; archived media content within a journal’s library; or, more generally, information sources that press rooms gather or receive, e.g., political programs of politicians, web sites of interest, posts on social networks, wikileaks material.
Looking at a bigger picture, data journalism is an emblematic example of the many real-life situations where analysts—the journalists in our case—have to explore and analyze vast amounts of heterogeneous information sources to gain insight on a question and make informed decisions. The difficulty of analytics in this type of situations comes from the conjunction of a number of factors: the amount of information sources to take into account; the heterogeneity of information sources; possibly, the timeliness of information sources; and, last but not least, the complexity of the phenomena that analysts are looking at. A direct consequence of the first and last points is the need for collaborative work to decipher large-scale complex heterogeneous information sources so as to collectively gain insight.
As of today, analytics on heterogeneous information sources lacks integrated formalisms and approaches that enable to seam- lessly leverage data, content and knowledge, collaboratively or not, and analysts have to make use of tools from distinct scientific worlds.
Departing from current approaches, the key idea of the iCODA project is to integrate and link all information sources within a unified graph, where nodes represent entities within information sources (content fragment, URI in knowledge bases, table in relational databases, etc.), edges representing the relationship between information sources.
Graphs are a natural tool to represent how diverse elements interplay, with vertices of varying types and edges carrying different meanings. Directions associated with edges typically materialize the inherent asymmetry in the relations between elements. Graphs in iCODA will for example encompass RDF knowledge graphs and rules, content similarity graphs, relations to data and databases. We will exploit this convergence of formalisms to leverage existing graph mining and graph visualization algorithms.
This graph model is at the core of the iCODA vision, with research questions around its construction, its querying and its interactive manipulation. The resulting workflow builds on three main groups of functionalities that all take input from the data model (the graph) and possibly modify it. This interaction through the data model is crucial, enabling to iteratively and interactively modify the state of knowledge on a problem and, as a consequence, update the underlying graph.
iCODA builds on three use-cases that instantiate in the world of data journalism the general principles of integrated collaborative analytics.
- Local politics and public projects. Ouest-France has accumulated an immense archive of texts, photos and videos reflecting the local life in Brittany. In particular, this archive stores a large amount of articles describing the various initiatives taken by the local political figures over the years. These articles often record the projects, their developments, their acceptance within the population, their outcome, their costs, etc. In addition, Ouest-France also maintains a number of data and knowledge bases that describes municipalities, mayors, local institutions, etc. Analytics will enable to gain insight on an existing local initiatives, which might indeed prove useful in designing a new one that share some resemblance, avoiding to make the same mistakes.
- Fake news propagation. Les Décodeurs have a strong focus on national politics, analyzing and checking statements made by French political figures. As recent election events have demonstrated, fake and/or inflammatory topics posted on social media, possibly propagated by bots, can play a significant role in influencing voter opinions. Exploiting Le Decodex, a database of known Twitter accounts, and knowledge on political figures, Les Décodeurs will run an incremental process to analyze archived tweets in order to detect when a certain topic was first brought in the accounts they follow, by whom, and how did it spread in the network.
- Political programs. AFP recently initiated the construction of a data/knowledge base that describes the programs of various politicians at national elections. Exploring this database along with related content and public data should enable insight on the national political life. The idea of the AFP use case is, on the one-hand, to facilitate the construction of the database from the exploration of content, and, on the other hand, to develop technology and tools that enable searching and exploring the ’political program’ database as well as validating elements of the program .
iCODA’s consortium builds around four Inria teams with complementary expertise
- CEDAR – Rich data analytics at cloud scale
- GraphIK – Knowledge representation and reasoning
- ILDA – Collaborative interaction with large datasets
- Linkmedia – Content-based multimedia analysis, indexing and linking
complemented by major national press partners
to validate findings in real-life scenarios.