Programme et événements

Tutoriels

IoT Big Data Stream Mining

Albert Bifet, Télécom ParisTech, Université Paris-Saclay

The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in IoT stream mining. This tutorial is a gentle introduction to mining IoT big data streams. The first part introduces data stream learners for classification, regression, clustering, and frequent pattern mining. The second part deals with scalability issues inherent in IoT applications, and discusses how to mine data streams on distributed engines such as Spark, Flink, Storm, and Samza.

Albert Bifet is Associate Professor at Telecom ParisTech. Previously he worked at Huawei Noah’s Ark Lab in Hong Kong, Yahoo Labs in Barcelona, University of Waikato and UPC BarcelonaTech. He is the author of a book on Adaptive Stream Mining and Pattern Learning and Mining from Evolving Data Streams. He is one of the leaders of MOA and Apache SAMOA software environments for implementing algorithms and running experiments for online learning from evolving data streams. He was serving as Co-Chair of the Industrial track of IEEE MDM 2016, ECML PKDD 2015, and as Co-Chair of BigMine (2017-2012), and ACM SAC Data Streams Track (2018-2012).

Semantic Data Management in Practice

Olivier Curé, Université Paris-Est Marne-la-Vallée

After years of research and development, standards and technologies for semantic data are sufficiently mature to be used as the foundation of novel data science projects that employ semantic technologies in various application domains such as life finance, life and social sciences, etc. Typically, such projects are carried out by domain experts who have a conceptual understanding of semantic technologies but lack the expertise to choose and to employ existing data management solutions for the semantic data in their project. For such experts, including domain-focused data scientists, project coordinators, and project engineers, our tutorial delivers a practitioner’s guide to semantic data management. We discuss the following important aspects of semantic data management and demonstrate how to address these aspects in practice by using mature, production-ready tools: i) storing and querying semantic data; ii) automated reasoning understanding.

Olivier Curé is a tenured associate professor in computer science at the University of Paris-Est Marne la Vallée (UPEM) in France. He obtained his Ph.D. in Artificial Intelligence at the Université Paris V, France. His research interests are data and knowledge base management systems, semantic information and reasoning. He has published 1 book (“RDF Database Systems: Triples Storage and SPARQL Query Processing”, Morgan Kaufmann, 2014), 5 book chapters, 13 journal papers, and over 60 research papers in international, peer-reviewed conferences on Databases, Semantic Web, and Big Data.

Preference-based Pattern Mining

Bruno Crémilleux Université de Caen Normandie
Marc Plantevit – Université de Lyon
Arnaud Soulet – Université François Rabelais de Tours

This tutorial focuses on the recent shift from constraint-based pattern mining to preference-based pattern mining and interactive pattern mining. Constraint-based pattern mining, which shares common notions with FCA, is now a mature domain of data mining that makes it possible to handle various different pattern domains (e.g., itemsets, sequences, graphs) with a large variety of constraints thanks to solid theoretical foundations and an efficient algorithmic machinery. Even though, it has been realized for a long time that it is difficult for the end-user to model her interest in term of constraints and above to overcome the well-known thresholding issue, researchers have only recently intensified their study of methods for finding high-quality patterns according to the user’s preferences.

In this tutorial, we discuss the need of preferences in pattern mining, the principles and methods of the use of preferences in pattern mining. Many methods are derived from constraint-based pattern mining by integrating utility functions or interestingness measures as quantitative preference model. This approach transforms pattern mining in an optimization problem guided by user specified preferences. However, in practice, the user has only a vague idea of what useful patterns could be. The recent research field of interactive pattern mining relies on the automatic acquisition of these preferences and the development of the instant data mining field.

 

Bruno Crémilleux is professor in computer science at the University of Caen-Normandie. He received his PhD in computer science at the University of Grenoble. His main research interests are pattern (set) discovery, Constraint Satisfaction Problems and data mining, preference queries and exploratory data mining.

Marc Plantevit is associate professor in computer sciences at the University of Lyon. He received his PhD in computer science from the University of Montpellier. His research interest include constraint-based pattern mining in general. Currently, he is very interested with sophisticate pattern domains (dynamic/ attributed graphs) and in incorporating background knowledge into pattern mining.

Arnaud Soulet is associate professor in computer science at the University François Rabelais of Tours. He received his PhD at the University of Caen. He has an expertise in constraint-based pattern mining and involvement in the mining process like pattern mining techniques for preference elicitation.

Articles (longs et courts) acceptés

Les articles suivants ont été acceptés pour présentation à BDA 2017. Ils sont donnés dans un ordre arbitraire.

  • Abdullah Abbas, Pierre Genevès, Cécile Roisin et Nabil Layaïda. Optimisation de l’évaluation de requêtes SPARQL en présence de contraintes ShEx.
  • Mehdi Zitouni, Reza Akbarinia, Sadok Ben Yahia et Florent Masseglia. Massively Distributed Environments and Closed Itemset Mining the DCIM Approach.
  • Pierre Bourhis, Juan Reuters, Fernando Suárez et Domagoj Vrgoč. JSON: Modèle de données, langage de requête et de schéma JSON: data model, query languages and schema specification.
  • Luc Segoufin et Alexandre Vigny. Énumération des requêtes du premier ordre sur classes de bases de données avec local bounded expansion.
  • Jocelyn De Goër, Myoung-Ah Kang, Xavier Bailly et Engelbert Mephu-Nguifo. PSH-DB, un système clé-valeur permettant l’indexation et la recherche de séquences ADN.
  • Sebastian Link et Henri Prade. Conception de Schémas de Bases de Données Relationnelles en présence de Données Incertaines.
  • Sara El Hassad, Francois Goasdoue et Helene Jaudoin. Learning Commonalities in SPARQL.
  • Raef Mousheimish, Yehia Taher et Karine Zeitouni. Apprentissage automatique de règles CEP prédictives: combler le gap entre fouille de données et traitement des événements complexes.
  • Antoine Amarilli, Pierre Bourhis, Louis Jachiet et Stefan Mengel. Une approche par circuit pour une énumération efficace (A Circuit-Based Approach to Efficient Enumeration).
  • Pierre Gançarski, Antoine Cornuéjols, Cédric Wemmert et Younès Bennani. Clustering collaboratif : Principes et mise en œuvre.
  • Damien Graux, Louis Jachiet, Pierre Geneves et Nabil Layaida. Une classification expérimentale multi-critères des évaluateurs SPARQL répartis.
  • Benjamin Billet, Mickaël Jurret, Didier Parigot et Patrick Valduriez. End-to-end Graph Mapper.
  • Maximilien Danisch, Hubert Chan et Mauro Sozio. Large Scale Density-friendly Graph Decomposition via Convex Programming.
  • Paul Tran-Van, Nicolas Anciaux et Philippe Pucheral. SWYSWYK: a Privacy-by-Design Paradigm for Personal Information Management Systems.
  • Momar Sakho, Iovka Boneva et Joachim Niehren. Complexity of Earliest Query Answering for Hyperstreams.
  • Abdulhafiz Alkhouli et Dan Vodislav. Continuous processing of diversity-aware top-k queries in social networks.
  • Jean-Benoit Griesner, Talel Abdessalem, Hubert Naacke et Pierre Dosne. ALGeoSPF: A Hierarchical Geographical Factorization Model for POI Recommendation.
  • Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli et Carlo Sartiani. Counting Types for Massive JSON Datasets.
  • Abdeslem Belghoul, Mourad Baiou, Radu Ciucanu et Farouk Toumani. Optimizing Communication Time via Middleware Tuning.
  • Paul Lagrée, Olivier Cappe, Bogdan Cautis et Silviu Maniu. Maximisation en ligne et à grande échelle de l’influence sur les réseaux sociaux.
  • Nadime Francis et Leonid Libkin. Schema Mappings for Data Graphs.
  • Ji Liu, Luis Pineda, Esther Pacitti, Alexandru Costan, Patrick Valduriez, Gabriel Antoniu et Marta Mattoso. Efficient Scheduling of Scientific Workflows using Hot Metadata in a Multisite Cloud.
  • Marie Le Guilly, Jean-Marc Petit et Marian Scuturici. Retour d’expérience sur l’analyse des données d’un tunnelier.
  • Quentin Grossetti, Camelia Constantin, Cédric Du Mouza et Nicolas Travers. Enhance micro-blogging recommendations of posts with an homophily-based graph.

Les commentaires sont clos.