# Gudhi/Top Data Workshop – October 17-20, 2016

## Invited talks:

• Anthony Bak (Palantir Technologies)
“Persistent Homology as Feature Generation for Drug Discovery”
Abstract: We use persistent homology as a way to generate features for a database of drug-like molecules. Our goal is to separate out likely drugs and the particular problem we tackle is complicated by multi-species effects (you don’t want an antibiotic to also attack the host). We are able to meet state of the art computational chemistry accuracy for this problem.
• Senja Barthel (EPFL)
Three examples of persistent homology applications in material science
Abstract: Material science is a field with many potential applications of persistent homology. Depending on the setting, the focus can lie more on statistical analyses or on identifying very specific properties of given materials.
Three projects will be presented:
1) Description of pore shapes of nano-porous materials to construct a new descriptor for gas adsorption and separation, in particular carbon capture and methane storage
2) Prediction of ionic conductivity in superionic conductors, for example by determining energetic barriers along diffusion paths
3) Analysis of hydrogen-bond networks in water
• Van Bui Tran (IFP)
“Examples of problems involving data exploration in the fields of oil&gas and mobility”
Abstract: Some examples of real-world problems in the fields of oil&gas and mobility are given. Real data will be described as well as the problems as they are expressed by the end users. Solutions to the problems are not addressed.

In one example, data are real images that are to be paired with their counterparts in a digital archive of images (and videos) obtained from experiments according to some meaningful features to be detected. In a second example, a “good” three-dimensional chemical structure is to be associated to a desired performance, and then will be serving as a reference structure. In other two examples coming from different application domains,  a “good” behavior over time has to be characterized from a variety of data of different types (numeric, text, time-series of different lengths, discrete events, …).

• Frédéric Cazals (Inria, ABS Team)
Exploring Energy Landscapes
Abstract: In this talk, I will discuss two key problems arising while studying conformational ensembles of (bio-)molécules.

The first problem consists in sampling a potential energy landscape, namely the potential energy function of a molecular system.  For this problem, a hybrid algorithm mixing a Monte Carlo based strategy and rapidly exploring random trees–which embark Voronoi diagrams in disguise–will be presented.

The second problem consists in comparing two molecular conformations. Such a comparison is ubiquitous, and is in particular at the heart of our hybrid algorithm. The classical method  to accomplish it is to compute the so-called root mean square deviation (lRMSD), which provides poor information for medium range lRMSD values. A more local method will be presented, based on the  identification of quasi-isometric sets, obtained from suitable persistence diagrams encoding the conservation of pairwise distances.

• Michael Kerber (Graz University)
Abstract: The computational pipeline of TDA consists of three major steps: (1) Computing a multi-scale representation of input data (e.g., a filtration of simplicial complexes), (2) determining the topological invariants of the representation, and (3) evaluate the result to improve the understanding of the data. I will report on two recent projects that relate to (1) and (3), respectively: The first project is a novel approach to approximate Cech and Rips filtrations using permutahedral tilings of Euclidean space (joint work with Aruni Choudhary and Sharath Raghvendra).
The second project is an efficient algorithm to compute the Bottleneck and Wasserstein distance of persistence diagrams, exploiting their geometric nature (joint work with Dmitriy Morozov and Arnur Nigmetov).
• André Lieutier (Dassault Systemes)
“When supports of minimal chains are manifolds”
Abstract: We study the minimization of weighted $L^1$ norms over simplicial chains in a given homology class. We consider in particular the fundamental classes of complexes (Cech of Rips for examples) that capture the homology of smooth compact manifolds embedded in Euclidean space. In this context, and under specific conditions, the well-known sparsity tendency of $L^1$ minima favors chains whose supports — {\em i.e.} the sets of simplices with non-zero coefficients — are triangulations of those manifolds. Experiments for 2-manifolds seem to confirm this behavior. In the Euclidean case, a family of weights previously used for mesh optimization, provably leads to Delaunay triangulations as supports of minimal chains. We further study the behavior of minima in non Euclidean case (Cech complex over a points sample of an embedded manifold with positive reach).We present partial results of this ongoing work. In particular, for 2-dimensional manifolds, we propose a sketch of proof of the fact that a well-chosen weight leads to triangulations.
• Nina Otter  (Oxford University)
Multi-parameter persistent homology: applications and algorithms
Abstract: A fundamental tool in topological data analysis is persistent homology, which allows one to extract qualitative information from data. While the theory of persistent homology for filtrations associated to a single parameter is well-understood, the situation for multiple parameters is more delicate; Carlsson and Zomorodian introduced persistent homology for multi-filtered complexes via multi-graded modules over a polynomial ring and showed that in the multi-parameter case there is no analogue of barcodes, which give a classification of isomorphism classes of one-parameter persistence modules.

In this talk I will first briefly cover the complexity of the theory for multi-parameter persistent homology; I will then discuss some applications, and algorithms related to the computation of a presentation of multi-parameter persistence modules.

This talk is based on work in progress, joint with Heather Harrington, Henry Schenck, and Ulrike Tillmann.

• Hannah Schreiber  (Graz University)
Barcodes of Towers and a Streaming Algorithm for Persistent Homology
Abstract: A tower is a sequence of simplicial complexes connected by simplicial maps. We will show how to compute a filtration, a sequence of nested simplicial complexes, with the same persistent barcode as the tower. Our approach is based on the coning strategy by Dey et al. (SoCG 2014). The variant of this approach yields a filtration that is asymptotically only marginally larger than the tower and can be efficiently computed by a streaming algorithm, both in theory and in practice. Furthermore, we show that our approach can be combined with a streaming algorithm to compute the barcode of the tower via matrix reduction. The space complexity of the algorithm does not depend on the length of the tower, but the maximal size of any subcomplex within the tower (work in progress, joint with Micheal Kerber).

## Talks:

• Eddie Aamari
About Reach Estimation: Geometric Stability and Minimax Bounds
Abstract: Various problems within computational geometry and manifold learning encode geometric regularity through the so-called reach, a generalized convexity parameter. The reach $\tau_M$ of a submanifold $M \subset \R^D$ is the maximal offset radius on which the projection onto $M$ is well defined. The quantity $\tau_M$ renders a certain minimal scale of $M$, giving bounds on both maximum curvature and possible bottleneck structures. In this talk, we will study the geometry of the reach through an approximation theory perspective. We present new geometric results on the reach of submanifolds without boundary. An estimator $\hat{\tau}_n$ of $\tau_M$ will be described, in an idealized i.i.d. framework where tangent spaces are known. The estimator $\hat{\tau}_n$ is showed to achieve uniform expected loss bounds over a $\mathcal{C}^3$-like model. Minimax upper and lower bounds are derived. We will conclude with an extension to a model in which tangent spaces are unknown.

• Claire Brecheteau
The distance-to-a-measure signature for a bootstrap test of isomorphism between metric measure spaces.”
Abstract: In this talk, we focus on the problem of testing the equality of metric measure spaces (mm-spaces) up to an isomorphism, from two samples of points. For this purpose, we introduce a new shape signature, the distance-to-a-measure signature, which is a probability measure on $\R_+$ built from the mm-space of interest. This signature is inspired by the object distance to a measure introduced by Frédéric Chazal, David Cohen-Steiner and Quentin Mérigot in 2011. Our test rejects the hypothesis of equality of mm-spaces up to an isomorphism whenever the statistic, that is, the $L_1$-Wasserstein distance between the empirical signatures, is too large.To perform the test, we use bootstrap to approximate the law of the statistic under the null. The rate of convergence of the $L_1$-Wasserstein distance between the two laws is shown to be as close as we wish to $N^{-\frac{1}{\max\{d,2\}}}$ for measures supported on compact subsets of $\R^d$, and to $N^{-\frac{1}{2}}$ if we add an assumption of Ahlfors regularity for the measures and of connexity for their support. Under some assumptions, we prove that these laws converge to some fixed law when the size of the samples goes to infinity. Then, the test is proved to be asymptotically of the correct level.Moreover, we derive a lower bound for the power of the test depending on the discriminative quantity, the $L_1$-Wasserstein distance between the two underlying signatures. Finally, we derive some lower bounds for this discriminative quantity under some alternatives.
• Pawel Dlotko
Goodies in Statistic and ML.”

Abstract:
• Kunal Dutta
Shallow Packings and their Applications”

Abstract:
• Vincent Rouvreau
What is new in GUDHI v.1.4.0  ?
Abstract: After a short presentation of the GUDHI project and its main goals, the following new modules will be presented:
– Tangential complex.
– Relaxed witness complex.
– Bindings with Phat.
– Bottleneck distance.
– Gudhi stat, statistics on persistence diagrams and distance to measure.
– Python interface via Cython
• Boris Thibert
Non isometric shape matching and Generalized Voronoi Covariance Measure
Abstract: I will present in this talk two problems arising in shape matching and in geometry estimation. I will first consider the problem of finding meaningful correspondences between 3D models that are related but not necessarily very similar. When the shapes are quite different, a point-to-point map is not always appropriate, so the idea is to build a set of correspondences between shape regions. Based on the observation that for many feature functions on the shapes, points in matching regions have a similar rank, we build an affinity matrix and define a pair of regions to be stable if it corresponds to the highest values of the affinity matrix. The problem of finding a stable pair amounts to solve a biclustering problem in a bipartite graph and we propose a truncated power method to solve it in a discrete setting.In the second part of the talk, I will present the Generalized Voronoi Covariance Measure. The Voronoi Covariance Measure of a compact set is a tensor-valued measure that encodes geometric information of the compact set and which is known to be resilient to Hausdorff noise but sensitive to outliers. We generalize this notion to distance-like functions and show that when used with the distance to a measure, it provides an accurate tool resilient to outliers for normal estimation.
• Mathijs Wintraecken
Approximating Riemannian Voronoi Diagrams and Delaunay Triangulations
Abstract: We discuss the numerical approximation of Voronoi diagrams on domains in $\mathbb{R}^d$ endowed with a Riemannian metric. We give conditions such that the dual of our approximation of the Voronoi diagram coincides with the Riemannian Delaunay triangulation. Our approximation therefore gives a triangulation, under certain conditions. We also discuss embeddings of a realization of the triangulation in the domain with Riemannian simplices or straight simplices.

## Practical informations:

##### Accommodation:

The number of single room is limited.

Accommodation in double room for students.

Oher participants in double room or single, depending on availability.

##### Mission:

Agents from Inria Sophia: ERC GUDHI – Allocation 8349 – DB 04RECH6016-S

Agents from Inria Saclay: see with Christine – DB 11RECH1046-S

##### Sojourn:

The fees will be cover by the project from Monday October 17 (lunch included) til Thursday October 20 (morning).

##### Transportation:
###### From Saclay:

Plane:  Monday October 17, Paris/Toulon available flights:  7:55 –> 9:15  or  8:50 –> 10:10

Taxi: Toulon Airport/Tour Fondue Giens – Around 25 € – it takes about 20 mn

Bus: Toulon Airport/Tour Fondue Giens – Lines : 63 then 67 – it takes about 40 mn. More information

###### From Sophia:

By car: A8 exit  Hyères/Toulon/Le Luc –  around 1h45 mn

You can leave your vehicule in the parking near the jetty of Tour Fondue – Cost around 15 € / 24h

By train: from Nice to Toulon Station around 1h40 mn – from Antibes around 1h20 mn

from Nice to Hyères Station around 3 h – from Antibes around 2h40 mn

Bus: Toulon station/Tour Fondue – Lines 9-39 then 67 – more than 2 h.

Hyères station/Tour Fondue Giens – Line 67 – It takes about 20 mn.