Self-paced learning for exploiting noisy, diverse or incomplete data
- Dr. Nikos Paragios, GALEN project-team, Inria Saclay Île-de-France
- Prof. Daphne Koller, Stanford University
Supervised learning requires that all the samples in the training dataset are fully annotated. In many cases, the collection of such datasets is either impossible to obtain (e.g., in computational biology where the ground truth annotation is unknown) or, even if it is possible, it is too expensive (e.g., segmentation of millions of images). SPLENDID would like to move towards a learning paradigm that reflects the true availability of data in real life. The goal of SPLENDID is thus to develop machine learning techniques that allow exploiting the information present in the following three related categories of data: (i) Diverse data, where some training samples are fully supervised, while other samples are weakly supervised; (ii) Incomplete data, where the training samples have not been fully annotated; and (iii) Noisy data, where some of the training samples may be labeled incorrectly.
SPLENDID results span the following areas: Dissimilarity coefficient learning, Parameter estimation for random walks, Parameter estimation for region-based models, and Local symmetries for contour and object reconstruction.
Publications and Awards:
- 1 Journal article.
- 2 Conference papers.
- API for Modeling Latent Variable Uncertainty for Loss-based Learning
While Prof. Daphne Koller has left Stanford University to work full-time as the CEO of Coursera, GALEN has pursued its research on this topic with IIIT Hyderabad (supported by a CEFIPRA grant). Preliminary results of this work appeared in the leading conferences on machine learning (NIPS 2014) and computer vision (ECCV 2014 and CVPR 2014).