Short presentations – Combinatorics and Life Sciences

This list may be updated still until September 18.

Léo Ackermann, Univ. Rennes, Inria, CNRS, IRISA, Rennes, France

Title talk: Counting Simple Cycles in the de Bruijn Graph

Short abstract: Efficiently counting and enumerating simple cycles in the de Bruijn graphs (dBG) could help establish worst-case performance guarantees for minimizers in bioinformatics. Computing minimizers involves sliding a window over the sequence to be sketched, which corresponds precisely to a walk in the dBG. However, the exponential increase in nodes and edges in the dBG as their order grows renders depth-first traversal approaches untractable and necessitates combinatorial insights. In this work, we compile and present existing results on Lyndon words in the context of dBG cycles for the bioinformatics community. Additionally, we derive two formulas for the number of simple cycles of lengths n+1 and n+2 in a dBG of order n, and conjecture a third formula for the length n+3. These results are implemented in a Rust library that comes with a command-line interface.

Théo Boury, Laboratoire d’Informatique de l’École Polytechnique (LIX), France

Title talk: Parameterized approaches for the RNA energy barrier

Short abstract: The computation of energy barriers between RNA structures is a classic NP-hard problem in Bioinformatics, of which the Independent Set reconfiguration in bipartite graphs represents a natural generalization. Parameterized algorithms, based on parameters taking limited or bounded values on biological instances, are thus crucial towards practical solutions. We have shown (currently in preprint) that bipartite Independent Set reconfiguration is slice-wise Polynomial (XP) solvable for two parameters: the range \rho (extension of the barrier) allowed along the reconfiguration, and the arboricity \phi when the input is restricted to a specific class of graphs that we called circle graph. Such a setting is relevant to Bioinformatics as it provides a solution to the RNA energy barrier problem. We propose algorithms based on divide-and-conquer approaches, yielding a O(n^2)-space, O(n^{2\rho+2.5})-time algorithm for the range \rho, and a O(n^{\phi}+2)-space, O(n^{\phi}+3)-time algorithm for the arboricity \phi. This talk aims to introduce the problem and discuss these two different but complementary algorithmic strategies.

Samuel Braunfeld, Charles University, Czech Republic

Title talk: Coloring and shift-graphs in simple graph classes

Short abstract: We study chi-boundedness, which relates the number of colors needed for a valid coloring of a graph to the size of its largest clique, in a family of graph classes where the edge-relation is determined by a simple local algorithm. We show that the failure of chi-boundedness in such a class is always witnessed by a class of shift graphs. For classes that admit a suitable presentation, we also provide an algorithm deciding chi-boundedness. Joint work with Sarosh Adenwalla, Tomás Hons, John Sylvester, and Viktor Zamaraev.

Guillaume Cerutti, Inria Lyon, France

Title talk: Plant Tissues as Topological Complexes: Reconstruction \& Other Combinatorial Challenges

Short abstract: The geometry of multicellular tissues can be formalized as polyhedral cellular complexes, topological structures where the highest dimension elements (cells) are linked together by their boundaries (faces), themselves forming part of the complex. In the dual structure of this geometry complex, the cells in the tissue become vertices, and are connected by edges reflecting cell adjacencies. This adjacency complex also contains higher order elements (corresponding to junctions between more that 2 cells) that we can reasonably assume to be simplicial. We therefore consider that the topology of a tissue can be represented as a tetrahedrization (or a triangulation in 2D) built upon points that correspond to cells. Advances in digital microscopy now enable the live imaging of developing tissues in 3D at sub-cellular resolution. Combined with state-of-the-art segmentation, this yields voxel-level reconstructions of tissue geometry as images in which every cell is labelled. The challenge is then to recover the simplicial complex of cell adjacency based on this voxel information. The resulting structures could serve not only as topological representations, but also as meshes for simulating biochemical or mechanical processes. However, due the discrete nature of image lattices, and to local artifacts of the segmentation methods, straightforward methods (based on the adjacency graph, or the extraction of 4-way junctions) generally fail to produce valid simplicial complexes. Instead, we are faced with a combinatorial problem: selecting an optimal set of simplices, built upon the tissue cells taken as a set of vertices, that forms a valid complex and matches as much as possible the adjacency information from the image, while satisfying the regularity constraints required for numerical simulations. Over the past years, our team has proposed several computational geometry algorithms to tackle this problem, but none can be proven to achieve an optimal solution in the general 3D case. Beyond reconstruction, representing tissues as simplicial complexes opens new directions to address computational biology problems. For tissues imaged over time for instance, the problem of cell lineaging (identifying the parent of each cell in the previous frame) could be formalized as finding an **optimal edit path** between the simplicial complexes of consecutive time points. The same could be envisioned for matching different individuals at cell-level to study how variable or conserved the patterns of cellular organization are. Formulating such biological questions as well-posed combinatorial problems could create an exciting bridge between geometry, topology, and developmental biology.

Lamine Diop, EPITA Research Laboratory, Le Kremlin-Bicêtre, France

Title talk: RPS: A Generic Reservoir Patterns Sampler

Short abstract: Efficient learning from streaming data is important for modern data analysis due to the continuous and rapid evolution of data streams. Despite significant advancements in stream pattern mining, challenges persist, particularly in managing complex data streams like sequential and weighted itemsets. While reservoir sampling serves as a fundamental method for randomly selecting fixed-size samples from data streams, its application to such complex patterns remains largely unexplored. In this study, we introduce an approach that harnesses a weighted reservoir to facilitate direct pattern sampling from streaming batch data, thus ensuring scalability and efficiency. We present a generic algorithm capable of addressing temporal biases and handling various pattern types, including sequential, weighted, and unweighted itemsets. Through comprehensive experiments conducted on real-world datasets, we evaluate the effectiveness of our method, showcasing its ability to construct accurate incremental online classifiers for sequential data. Our approach not only enables previously unusable online machine learning models for sequential data to achieve accuracy comparable to offline baselines but also represents significant progress in the development of incremental online sequential itemset classifiers.

Laurent Feuilloley, CNRS, LIRIS, Lyon, France

Title talk: On the diameter of locally constrained trees

Short abstract: The following problem is motivated by recent results in distributed computing. Consider an algorithm A that decides whether some neighborhood of a graph is “correct” or not, and say that the class associated with A is the set of trees where all neighborhoods are correct. We study the diameter landscapes: what are the possible diameters for these trees. Although the original motivation is not related to life science, we believe that some aspects could be of interest for the theoretical study of living being structures. Based on joint work with Nicolas Bousquet, Antonin Kiladjian and Théo Pierron.

Andreas Grigorjew, University Paris Dauphine-PSL, France

Title talk: Graph Parameters for Minimum Flow Decomposition in Genome Assembly

Short abstract: In RNA transcript assembly, sequencing data is represented as a splice graph, where weighted paths correspond to possible RNA sequences. More generally, this is an instance of the minimum flow decomposition (MFD) problem: expressing a flow as the sum of the fewest weighted paths. Because MFD is strongly NP-hard, heuristics are often used to solve the problem. I will describe structural properties of these digraphs that explain their strong performance, identify the cases where heuristics break down, and show how these insights deepen our theoretical understanding of the MFD problem.

Florian Ingels, University of Lille, France

Title talk: Imbalance in minimizer-based partitions of k-mers: a theoretical approach

Short abstract: The minimizer of a word of size k (a k-mer) is defined as its smallest substring of size m (with m≤k), according to some ordering on m-mers. minimizers have been used in bioinformatics — notably — to partition sequencing datasets, binning together k-mers that share the same minimizer. It is folklore that using the lexicographical order lead to very unbalanced partitions, resulting in an abundant literature devoted to devising alternative orders for achieving better balanced partitions. To the best of our knowledge, the unbalanced-ness of lexicographical-based minimizer partitions has never been investigated from a theoretical point of view. In this work, we aim to fill this gap and determine, for a given minimizer, how many k-mers would admit the chosen minimizer — i.e. what would be the size of the bucket associated to the chosen minimizer in the worst case, where all k-mers would be seen in the data.

Annachiara Korchmaros, University of Leipzig, Germany

Title talk: REvolutionH-tl: A Fast and Robust Tool for Decoding Evolutionary Gene Histories

Short abstract: Understanding how genes evolve across species requires identifying orthologs, reconstructing gene and species trees, and reconciling them into coherent scenarios. Existing approaches often rely on multiple tools and computationally heavy likelihood methods, limiting scalability. I will present REvolutionH-tl, an open-source, graph-based framework that integrates orthogroup detection, event-labeled gene tree reconstruction, species tree inference, and reconciliation into a single pipeline. The tool works in duplication–loss (DL) scenarios and achieves both accuracy and speed by exploiting the combinatorial structure of best match graphs. Benchmarking shows that it matches or outperforms widely used methods while producing intuitive visualizations. I will also outline ongoing work to extend REvolutionH-tl to handle horizontal gene transfers (HGTs), opening new directions for studying complex reticulate evolution.

Denis Kuperberg, ENS de Lyon, France

Title talk: Complexity of detecting autocatalysis in chemical reaction networks

Short abstract: We investigate the problem of detecting and enumerating autocatalytic cycles in a chemical reaction network. This problem comes in several variants, depending on the shape of the input, and on whether irreversible reactions are allowed. For some variants, we are able to show NP-completeness of the problem, while others are still open.

Roberto Santana, University of the Basque Country, Spain

Title talk: Estimation of Distribution Algorithms for Combinatorial Problems in Bioinformatics

Short abstract: Estimation of distribution algorithms (EDAs) are powerful optimization techniques that leverage probabilistic graphical models (PGMs) to guide the search for high-quality solutions. EDAs explicitly model promising solution distributions, making them particularly effective for complex combinatorial spaces. Their flexibility in handling both continuous and discrete representations—coupled with adaptable PGM selection—enables tailored applications across diverse bioinformatics challenges. In this talk, we explore how EDAs address structure-search and design problems framed as single- or multi-objective optimization tasks. Key applications include protein structure prediction, de-novo protein design, and multi-marker tagging SNP selection. We highlight recent methodological advances in EDAs, such as improved model learning and scalability, and discuss their potential to tackle emerging problems. A focal example will be the design of simplified vaccine candidates, where EDAs optimize antigenic properties while preserving immunogenic core features. Finally, we outline open challenges and opportunities for EDA-driven innovation in bioinformatics.

Nicolas Schabanel, CNRS – ENS Lyon, France

Title talk: Some algorithmicable questions in DNA nanostructure design

Short abstract: DNA origami is a now standard technique in synthetic biology to build biocompatible nanostructure reliably. However, designing such structures remains largely based on intuitive approaches based on questionable assumptions. Geometrical approaches of the design of DNA origami revealed a much broader design landscape than initially thought and many design issue could be optimized once properly formulated in an algorithmic and geometrical framework. In this talk, we will review some of these issues.