Programme

Emploi du temps

Tutoriels

Mardi, 11:00-13:00 : IoT Big Data Stream Mining (slides)

Albert Bifet, Télécom ParisTech, Université Paris-Saclay
Président de session : Joachim Niehren (Inria Lille)

The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in IoT stream mining. This tutorial is a gentle introduction to mining IoT big data streams. The first part introduces data stream learners for classification, regression, clustering, and frequent pattern mining. The second part deals with scalability issues inherent in IoT applications, and discusses how to mine data streams on distributed engines such as Spark, Flink, Storm, and Samza.

Albert Bifet is Associate Professor at Telecom ParisTech. Previously he worked at Huawei Noah’s Ark Lab in Hong Kong, Yahoo Labs in Barcelona, University of Waikato and UPC BarcelonaTech. He is the author of a book on Adaptive Stream Mining and Pattern Learning and Mining from Evolving Data Streams. He is one of the leaders of MOA and Apache SAMOA software environments for implementing algorithms and running experiments for online learning from evolving data streams. He was serving as Co-Chair of the Industrial track of IEEE MDM 2016, ECML PKDD 2015, and as Co-Chair of BigMine (2017-2012), and ACM SAC Data Streams Track (2018-2012).

Mardi, 14:00-16:00 : Knowledge Graph Expansion and Enrichment (slides)

Fatiha Saïs, Université Paris-Sud, Université Paris-Saclay
Président de session : Pierre Senellart (École normale supérieure & Inria Paris)

Today, we are experiencing an unprecedented production of resources, published as Linked Open Data (LOD, for short). This is leading to the creation of knowledge graphs (KGs) containing billions of RDF (Resource Description Framework) triples, such as DBpedia, YAGO and Wikidata on the academic side, and the Google Knowledge Graph or Microsoft’s Satori graph on the commercial side. These KGs contain millions of entities (such as people, proteins, or books), and millions of facts about them. This knowledge is typically expressed in RDF (Resource Description Framework), i.e., as triples of the form ⟨Macron, presidentOf, France⟩. Some KGs provide an ontology expressed in OWL2 (Web Ontology Language), which describes the vocabulary (the classes and properties) for the RDF facts. However, to exploit and take benefits from the richness of this available data and knowledge, several problems have to be faced, namely, data linking, data fusion and knowledge discovery, when data is of big volume, heterogeneous and evolving. In this tutorial we will first give an overview of exiting data linking and key discovery approaches. Then, we will discuss the problem of identity crisis caused by the misuse of owl:sameAs predicate and give some possible solutions. We will finish by highlighting some current challenges in this research area.

Fatiha Saïs is an Associate Professor at Paris Sud University in France. She obtained her Ph.D. in Computer Science at the University of Paris Sud. Her research interest are ontology-based data linking and fusion, RDF data evolution and knowledge discovery from RDF graphs. Her work has been included in several national, industrial and European projects. She has published more than 50 research papers in national and international conferences (AAAI, ISWC, K-Cap) and journals (JWS, KBS and JoDS).

Jeudi, 9:30-11:30 : Preference-based Pattern Mining (slides)

Bruno Crémilleux, Université de Caen Normandie
Marc Plantevit, Université de Lyon
Arnaud Soulet, Université François Rabelais de Tours
Président de session : Amedeo Napoli (CNRS, LORIA & Inria Nancy)

This tutorial focuses on the recent shift from constraint-based pattern mining to preference-based pattern mining and interactive pattern mining. Constraint-based pattern mining, which shares common notions with FCA, is now a mature domain of data mining that makes it possible to handle various different pattern domains (e.g., itemsets, sequences, graphs) with a large variety of constraints thanks to solid theoretical foundations and an efficient algorithmic machinery. Even though, it has been realized for a long time that it is difficult for the end-user to model her interest in term of constraints and above to overcome the well-known thresholding issue, researchers have only recently intensified their study of methods for finding high-quality patterns according to the user’s preferences.

In this tutorial, we discuss the need of preferences in pattern mining, the principles and methods of the use of preferences in pattern mining. Many methods are derived from constraint-based pattern mining by integrating utility functions or interestingness measures as quantitative preference model. This approach transforms pattern mining in an optimization problem guided by user specified preferences. However, in practice, the user has only a vague idea of what useful patterns could be. The recent research field of interactive pattern mining relies on the automatic acquisition of these preferences and the development of the instant data mining field.

Bruno Crémilleux is professor in computer science at the University of Caen-Normandie. He received his PhD in computer science at the University of Grenoble. His main research interests are pattern (set) discovery, Constraint Satisfaction Problems and data mining, preference queries and exploratory data mining.

Marc Plantevit is associate professor in computer sciences at the University of Lyon. He received his PhD in computer science from the University of Montpellier. His research interest include constraint-based pattern mining in general. Currently, he is very interested with sophisticate pattern domains (dynamic/ attributed graphs) and in incorporating background knowledge into pattern mining.

Arnaud Soulet is associate professor in computer science at the University François Rabelais of Tours. He received his PhD at the University of Caen. He has an expertise in constraint-based pattern mining and involvement in the mining process like pattern mining techniques for preference elicitation.

Sessions techniques

Session 1 : Incertitude et sécurité

Mardi, 16:30-17:30. Présidée par David Gross Amblard (Université Rennes).
  • Sebastian Link et Henri Prade. Conception de Schémas de Bases de Données Relationnelles en présence de Données Incertaines.
  • Paul Tran-Van, Nicolas Anciaux et Philippe Pucheral. SWYSWYK: a Privacy-by-Design Paradigm for Personal Information Management Systems. (slides)

Session 2 : données graphes

Mardi, 17:30-19:00. Présidée par Michaël Thomazo (Inria Saclay).
  • Sara El Hassad, Francois Goasdoue et Helene Jaudoin. Learning Commonalities in SPARQL. (slides)
  • Abdullah Abbas, Pierre Genevès, Cécile Roisin et Nabil Layaïda. Optimisation de l’évaluation de requêtes SPARQL en présence de contraintes ShEx.
  • Damien Graux, Louis Jachiet, Pierre Genevès et Nabil Layaida. Une classification expérimentale multi-critères des évaluateurs SPARQL répartis. (slides)

Session 3 : Fouille de données

Mercredi, 10:30-12:00. Présidée par Amedeo Napoli (CNRS, LORIA & Inria Nancy).
  • Pierre Gançarski, Antoine Cornuéjols, Cédric Wemmert et Younès Bennani. Clustering collaboratif : Principes et mise en œuvre. (slides)
  • Mehdi Zitouni, Reza Akbarinia, Sadok Ben Yahia et Florent Masseglia. Massively Distributed Environments and Closed Itemset Mining the DCIM Approach. (slides)
  • Raef Mousheimish, Yehia Taher et Karine Zeitouni. Apprentissage automatique de règles CEP prédictives: combler le gap entre fouille de données et traitement des événements complexes. (slides)

Session 4 : Données semi-structurées

Mercredi, 12:00-13:00. Présidée par Bernd Amann (Université Pierre-et-Marie-Curie)
  • Pierre Bourhis, Juan Reuters, Fernando Suárez et Domagoj Vrgoč. JSON: Modèle de données, langage de requête et de schéma JSON: data model, query languages and schema specification. (slides)
  • Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli et Carlo Sartiani. Counting Types for Massive JSON Datasets. (slides)

Session 5 : Systèmes et applications

Mercredi, 16:00-18:10. Présidée par Soror Sahri (Université Paris-Descartes).
  • Abdeslem Belghoul, Mourad Baiou, Radu Ciucanu et Farouk Toumani. Optimizing Communication Time via Middleware Tuning.
  • Ji Liu, Luis Pineda, Esther Pacitti, Alexandru Costan, Patrick Valduriez, Gabriel Antoniu et Marta Mattoso. Efficient Scheduling of Scientific Workflows using Hot Metadata in a Multisite Cloud. (slides)
  • Jocelyn De Goër, Myoung-Ah Kang, Xavier Bailly et Engelbert Mephu-Nguifo. PSH-DB, un système clé-valeur permettant l’indexation et la recherche de séquences ADN. (slides)
  • Benjamin Billet, Mickaël Jurret, Didier Parigot et Patrick Valduriez. End-to-end Graph Mapper. (article court) (slides)
  • Marie Le Guilly, Jean-Marc Petit et Marian Scuturici. Retour d’expérience sur l’analyse des données d’un tunnelier. (article court)

Session 6 : Théorie des BD

Jeudi, 14:00-16:00. Présidée par Victor Vianu (U. C. San Diego, ENS & Inria Paris).
  • Momar Sakho, Iovka Boneva et Joachim Niehren. Complexity of Earliest Query Answering for Hyperstreams. (slides)
  • Nadime Francis et Leonid Libkin. Schema Mappings for Data Graphs. (slides)
  • Luc Segoufin et Alexandre Vigny. Énumération des requêtes du premier ordre sur classes de bases de données avec local bounded expansion. (slides)
  • Antoine Amarilli, Pierre Bourhis, Louis Jachiet et Stefan Mengel. Une approche par circuit pour une énumération efficace. (slides)

Session 7 : Réseaux sociaux

Vendredi, 10:30-13:00. Présidée par Pierre Senellart (École normale supérieure & Inria Paris).
  • Jean-Benoit Griesner, Talel Abdessalem, Hubert Naacke et Pierre Dosne. ALGeoSPF: A Hierarchical Geographical Factorization Model for POI Recommendation.
  • Quentin Grossetti, Camelia Constantin, Cédric Du Mouza et Nicolas Travers. Enhance micro-blogging recommendations of posts with an homophily-based graph. (slides)
  • Abdulhafiz Alkhouli et Dan Vodislav. Continuous processing of diversity-aware top-k queries in social networks.
  • Paul Lagrée, Olivier Cappe, Bogdan Cautis et Silviu Maniu. Maximisation en ligne et à grande échelle de l’influence sur les réseaux sociaux. (slides | code)
  • Maximilien Danisch, Hubert Chan et Mauro Sozio. Large Scale Density-friendly Graph Decomposition via Convex Programming. (slides | code)

Démonstrations

Jeudi, 16:00-17:30
  • Angela Bonifati, Ioana Ileana et Michele Linardi.
    ChaseFUN: Un moteur d’Échange de Données efficace avec (et malgré) les dépendances fonctionnelles (poster).
  • Cyrille Ponchateau, Ladjel Bellatreche, Carlos Ordonez et Mickael Baron.
    MathMOuse: A Mathematical MOdels WarehoUSE to handle both Theoretical and Numerical Data (poster | code)
  • Karima Rafes, Sarah Cohen-Boulakia et Serge Abiteboul.
    Une infrastructure d’autocomplétion pour SPARQL générique et multi-services (poster | tutoriel | démonstrateur | logiciel)
  • Xiangnan Ren, Olivier Curé, Ke Li, Jeremy Lhez, Badre Belabbess, Tendry Randriamalala, Yufan Zheng et Gabriel Kepeklia.
    Strider: An Adaptive, Inference-enabled Distributed RDF Stream Processing Engine (code)
  • Olivier Rodriguez, Corentin Colomier, Cecilie Rivière, Reza Akbarinia et Federico Ulliana.
    Parallelizing Query Rewriting for Key-Value Stores Under Simple Semantic Constraints (poster | code)

Session doctorant(e)s

Mercredi, 14:00-15:30
  • Joris Duguépéroux. Garanties de confidentialité et d’efficacité sur les plate-formes de crowdsourcing (poster)
  • Louis Jachiet, Pierre Geneves, Nabil Layaida et Nils Gesbert. Une nouvelle algèbre pour SPARQL permettant l’optimisation des requêtes contenant des Property Paths (poster)
  • Nyoman Juniarta, Chedy Raïssi et Amedeo Napoli. Echantillonnage et fouille de motifs séquentiels – Application à l’analyse de trajectoires de visiteurs (poster)
  • Marie Le Guilly. Langages de requêtes interactifs pour l’exploration de données
  • Rutian Liu. Computing Schema Complements over Analytical Datasets (poster)
  • Pierre Monnin, Amedeo Napoli et Adrien Coulet. Confirming and Suggesting Subsumption Relations in an Ontology using Formal Concept Analysis (poster)
  • Chao Zhang, Farouk Toumani et Emmanuel Gangler. Symmetric and Asymmetric Aggregate Function in Massively Parallel Computing (poster)

Les commentaires sont clos.