In’Tro (13 décembre 2021, 13h30) : An introduction to Topological Data Analysis – Mathieu Carrière (DataShape)

Watch Mathieu’s presentation

 

Abstract

Topological Data Analysis (TDA) is a growing field of research at the intersection of data science and computational geometry and topology. It has encountered key successes in several different applications (ranging from cancer subtype identification in bioinformatics to shape recognition in computer vision, just to name a few), and become the landmark product of several companies in the recent years. Indeed, many data sets nowadays come in the form of point clouds embedded in very large dimensional spaces, yet concentrated around low-dimensional geometric structures that need to be uncovered. Unraveling these structures is precisely the goal of TDA, which can build descriptors that can reliably capture geometric and topological information (connectivity, loops, holes, curvature, etc.) from the data sets without the need for an explicit mapping to lower-dimensional space. This is extremely useful since the hidden, non-trivial topology of many data sets can make it very challenging to perform well for classical techniques in data science and machine learning, such as dimensionality reduction.

In this talk, I will provide a global overview of TDA, by introducing its main descriptors and by presenting the theoretical guarantees that they enjoy. I will also show how they can be efficiently computed in practice with the dedicated, open-source library GUDHI, and describe some applications where TDA proved useful.

Short bio

I did my PhD at Inria Saclay in the DataShape team, under the supervision of Steve Oudot, and a postdoc of two years in the Rabadán Lab, at the Department of Systems Biology of Columbia University, under the supervision of Raúl Rabadán. My research focuses on topological data analysis (TDA) and statistical machine learning (ML), with an application to bioinformatics and genomics. I contributed to the analysis of topological descriptors and their use in ML methods such as kernel-SVM or deep learning. My favorite languages are C++ and Python, but I also know a bit of R, Matlab and Java. I am also very familiar with Scikit-Learn and TensorFlow.

 

 

 

The presentation will be in English and streamed on BBB

Les commentaires sont clos.