Publications

Publications HAL du projet ANR. ANR-17-CE23-0018

2024

Journal articles

titre
Causal inference methods for combining randomized trials and observational studies: a review
auteur
Bénédicte Colnet, Imke Mayer, Guanhua Chen, Awa Dieng, Ruohong Li, Gaël Varoquaux, Jean-Philippe Vert, Julie Josse, Shu Yang
article
Statistical Science, In press
Resume_court
With increasing data availability, causal effects can be evaluated across different data sets, both …..
Accès au texte intégral et bibtex
https://hal.science/hal-03008276/file/main.pdf BibTex

Preprints, Working Papers, …

titre
On the consistency of supervised learning with missing values
auteur
Julie Josse, Jacob M. Chen, Nicolas Prost, Erwan Scornet, Gaël Varoquaux
article
2024
Resume_court
In many application settings, the data have missing entries which make analysis challenging. An abun …..
Accès au texte intégral et bibtex
https://hal.science/hal-02024202/file/main.pdf BibTex

2023

Journal articles

titre
Relational Data Embeddings for Feature Enrichment with Background Information
auteur
Alexis Cvetkov-Iliev, Alexandre Allauzen, Gaël Varoquaux
article
Machine Learning, 2023, 112 (2), pp.687-720. ⟨10.1007/s10994-022-06277-7⟩
Resume_court
For many machine-learning tasks, augmenting the data table at hand with features built from external …..
Accès au texte intégral et bibtex
https://hal.science/hal-03848124/file/main.pdf BibTex

Book sections

titre
Evaluating machine learning models and their diagnostic value
auteur
Gaël Varoquaux, Olivier Colliot
article
Olivier Colliot. Machine Learning for Brain Disorders, Springer, 2023
Resume_court
This chapter describes model validation, a crucial part of machine learning whether it is to select …..
Accès au texte intégral et bibtex
https://hal.science/hal-03682454/file/Chapter%2020%20-%20Final.pdf BibTex

2022

Journal articles

titre
Machine learning for medical imaging: methodological failures and recommendations for the future
auteur
Gaël Varoquaux, Veronika Cheplygina
article
npj Digital Medicine, 2022, 5 (1), pp.48. ⟨10.1038/s41746-022-00592-y⟩
Resume_court
Research in computer analysis of medical images bears many promises to improve patients’ health. H …..
Accès au bibtex
https://arxiv.org/pdf/2103.10292 BibTex
titre
Causal effect on a target population: a sensitivity analysis to handle missing covariates
auteur
Bénédicte Colnet, Julie Josse, Gaël Varoquaux, Erwan Scornet
article
Journal of Causal Inference, 2022, 10 (1), pp.372-414. ⟨10.1515/jci-2021-0059⟩
Resume_court
Randomized Controlled Trials (RCTs) are often considered as the gold standard to conclude on the cau …..
Accès au texte intégral et bibtex
https://hal.science/hal-03473691/file/JCI-version-finale.pdf BibTex
titre
How to remove or control confounds in predictive models, with applications to brain biomarkers
auteur
Darya Chyzhyk, Gaël Varoquaux, Michael Milham, Bertrand Thirion
article
GigaScience, 2022, 11, ⟨10.1093/gigascience/giac014⟩
Resume_court
Background : With increasing data sizes and more easily available computational methods, neuroscienc …..
Accès au texte intégral et bibtex
https://inria.hal.science/hal-03607651/file/giac014.pdf BibTex
titre
Analytics on Non-Normalized Data Sources: more Learning, rather than more Cleaning
auteur
Alexis Cvetkov-Iliev, Alexandre Allauzen, Gaël Varoquaux
article
IEEE Access, In press, 10, pp.42420-42431. ⟨10.1109/ACCESS.2022.3168013⟩
Resume_court
Data analysis is increasingly performed over data assembled from uncontrolled sources, facing incons …..
Accès au texte intégral et bibtex
https://hal.science/hal-03647434/file/final.pdf BibTex
titre
Benchmarking missing-values approaches for predictive models on health databases
auteur
Alexandre Perez-Lebel, Gaël Varoquaux, Marine Le Morvan, Julie Josse, Jean-Baptiste Poline
article
GigaScience, In press, ⟨10.1093/gigascience/giac013⟩
Resume_court
BACKGROUND: As databases grow larger, it becomes harder to fully control their collection, and they …..
Accès au texte intégral et bibtex
https://hal.science/hal-03526292/file/Benchmarking%20missing-values%20approaches%20for%20predictive%20models%20on%20health%20databases.pdf BibTex

2021

Journal articles

titre
Preventing dataset shift from breaking machine-learning biomarkers
auteur
Jérôme Dockès, Gaël Varoquaux, Jean-Baptiste Poline
article
GigaScience, In press, ⟨10.1093/gigascience/giab055⟩
Resume_court
Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedic …..
Accès au texte intégral et bibtex
https://hal.science/hal-03293375/file/main.pdf BibTex

Conference papers

titre
AI as statistical methods for imperfect theories
auteur
Gaël Varoquaux
article
NeurIPS 2021 – 35th Conference on Neural Information Processing Systems. Workshop: AI for Science, Dec 2021, Virtual, France
Resume_court
Science has progressed by reasoning on what models could not predict because they were missing impor …..
Accès au texte intégral et bibtex
https://hal.science/hal-03474791/file/paper.pdf BibTex
titre
What’s a good imputation to predict with missing values?
auteur
Marine Le Morvan, Julie Josse, Erwan Scornet, Gaël Varoquaux
article
NeurIPS 2021 – 35th Conference on Neural Information Processing Systems, Dec 2021, Virtual, France
Resume_court
How to learn a good predictor on data with missing values? Most efforts focus on first imputing as w …..
Accès au texte intégral et bibtex
https://hal.science/hal-03243931/file/LeMorvan2021_ImputeThenRegress.pdf BibTex
titre
Accounting for variance in machine learning benchmarks
auteur
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent
article
MLsys 2021 – 4th Conference on Machine Learning and Systems, Apr 2021, San Francisco (virtual), United States
Resume_court
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally ca …..
Accès au texte intégral et bibtex
https://hal.science/hal-03177159/file/main.pdf BibTex
titre
A lightweight neural model for biomedical entity linking
auteur
Lihu Chen, Gaël Varoquaux, Fabian Suchanek
article
AAAI 2021 – The Thirty-Fifth Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence, Feb 2021, Palo Alto (virtual), United States. pp.12657-12665
Resume_court
Biomedical entity linking aims to map biomedical mentions, such as diseases and drugs, to standard e …..
Accès au texte intégral et bibtex
https://hal.science/hal-03086044/file/Biomedical_Entity_Linking.pdf BibTex

2020

Journal articles

titre
Tropical Cyclone Track Forecasting using Fused Deep Learning from Aligned Reanalysis Data
auteur
Sophie Giffard-Roisin, Mo Yang, Guillaume Charpiat, Christina Kumler Bonfanti, Balázs Kégl, Claire Monteleoni
article
Frontiers in Big Data, 2020, 3, pp.1. ⟨10.3389/fdata.2020.00001⟩
Resume_court
The forecast of tropical cyclone trajectories is crucial for the protection of people and property. …..
Accès au texte intégral et bibtex
https://hal.science/hal-02329437/file/Frontiers_journal_author_version.pdf BibTex
titre
An Experimental Study of State-of-the-Art Entity Alignment Approaches
auteur
Xiang Zhao, Weixin Zeng, Jiuyang Tang, Wei Wang​, Fabian Suchanek
article
IEEE Transactions on Knowledge and Data Engineering, 2020, ⟨10.1109/TKDE.2020.3018741⟩
Resume_court
Entity alignment (EA) finds equivalent entities that are located in different knowledge graphs (KGs) …..
Accès au texte intégral et bibtex
https://imt.hal.science/hal-03108522/file/tkde-2020.pdf BibTex
titre
Encoding high-cardinality string categorical variables
auteur
Patricio Cerda, Gaël Varoquaux
article
IEEE Transactions on Knowledge and Data Engineering, In press, ⟨10.1109/TKDE.2020.2992529⟩
Resume_court
Statistical models usually require vector representations of categorical variables, using for instan …..
Accès au texte intégral et bibtex
https://inria.hal.science/hal-02171256/file/article.pdf BibTex

Conference papers

titre
NeuMiss networks: differentiable programming for supervised learning with missing values
auteur
Marine Le Morvan, Julie Josse, Thomas Moreau, Erwan Scornet, Gaël Varoquaux
article
NeurIPS 2020 – 34th Conference on Neural Information Processing Systems, Dec 2020, Vancouver / Virtual, Canada
Resume_court
The presence of missing values makes supervised learning much more challenging. Indeed, previous wor …..
Accès au texte intégral et bibtex
https://hal.science/hal-02888867/file/main.pdf BibTex
titre
Linear predictor on linearly-generated data with missing values: non consistency and solutions
auteur
Marine Le Morvan, Nicolas Prost, Julie Josse, Erwan Scornet, Gaël Varoquaux
article
AISTATS 2020 – International Conference on Artificial Intelligence and Statistics, Aug 2020, Online, France. pp.3165-3174
Resume_court
We consider building predictors when the data have missing values. We study the seemingly-simple cas …..
Accès au texte intégral et bibtex
https://hal.science/hal-02464569/file/aistats.pdf BibTex

2019

Conference papers

titre
Comparing distributions: $l1$ geometry improves kernel two-sample testing
auteur
Meyer Scetbon, Gaël Varoquaux
article
NeurIPS 2019 – 33th Conference on Neural Information Processing Systems, Dec 2019, Vancouver, Canada
Accès au texte intégral et bibtex
https://inria.hal.science/hal-02292545/file/NIPS_L1_test-HAL-v2%20%281%29.pdf BibTex

2018

Journal articles

titre
Atlases of cognition with large-scale human brain mapping
auteur
Gaël Varoquaux, Yannick Schwartz, Russell A Poldrack, Baptiste Gauthier, Danilo Bzdok, Jean-Baptiste Poline, Bertrand Thirion
article
PLoS Computational Biology, 2018, 14 (11), pp.e1006565. ⟨10.1371/journal.pcbi.1006565⟩
Resume_court
To map the neural substrate of mental function, cognitive neuroimaging relies on controlled psycholo …..
Accès au texte intégral et bibtex
https://inserm.hal.science/inserm-02146700/file/journal.pcbi.1006565.pdf BibTex
titre
Similarity encoding for learning with dirty categorical variables
auteur
Patricio Cerda, Gaël Varoquaux, Balázs Kégl
article
Machine Learning, 2018, ⟨10.1007/s10994-018-5724-2⟩
Resume_court
For statistical learning, categorical variables in a table are usually considered as discrete entiti …..
Accès au texte intégral et bibtex
https://inria.hal.science/hal-01806175/file/article_hal.pdf BibTex

Comments are closed.