Focus on a joint research project: LEGO


LEGO (Since 2016)

LEarning GOod representations for natural language processing  


Principal Investigators: 

  • Aurélien Bellet, MAGNET project-team, Inria Lille
  • Fei Sha, University of Southern California, Dpt of Computer Science (formerly with TEDS,UCLA)

Research objectives:

LEGO lies in the intersection of Machine Learning and Natural Language Processing (NLP). Its goal is to address the following challenges: what are the right representations for structured data and how to learn them automatically, and how to apply such representations to complex and structured prediction tasks in NLP? LEGO strongly relies on the complementary expertise of the two partners in areas such as representation learning, structured prediction, graph-based learning, and statistical NLP to offer a novel alternative to existing techniques. The team intends to push the state-of-the-art in several core NLP problems, such as dependency parsing, coreference resolution and discourse parsing.

Scientific achievements:

The contributions of LEGO span several research directions. The team has proposed methods to learn/adapt word representations for specific tasks (implicit discourse relation identification, dependency parsing, text classification), sometimes exploiting richer language contexts than simple word co-occurrences. LEGO has also developed methods to transfer word representations between related tasks. Finally, the team is developing approaches to learn word embeddings from multi-modal inputs (such as representation of visual objects extracted from images, in addition to the text corpora).

Publications and Awards:

  • 6 conference papers
  • First joint article in preparation

Selected Publication:

  • Melissa Ailem, Bowen Zhang, Aurélien Bellet, Pascal Denis and Fei Sha. A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018).