Marta Mattoso is a Professor of the Department of Computer Science at the COPPE Institute from Federal University of Rio de Janeiro (UFRJ) since 1994, leading the Distributed Data Science Research Group. She has been active in the database and provenance research community for more than twenty years and her current research interests include distributed and parallel big data analysis, data management aspects of scientific workflows and provenance data.
Marta, can you tell us about your research at COPPE/UFRJ?
My research projects include the development and implementation of distributed large-scale data solutions to support human-in-the-loop data analyses during the execution of many task computing in parallel for high performance computing environments. Oil & Gas and Bioinformatics experiments have been used with real problems and data to evaluate these solutions. Currently, I am interested in interpretability from data science experiments.
And what are your ties with France?
Besides my affection for the French culture, admiration for the French scientific achievements and having a son that graduated from Ecole Polytechnique (X-2007), my professional ties with France started six years after I finished my PhD in Brazil, when I was already a professor at COPPE/UFRJ.
When have you started your collaboration with Inria?
My collaboration with Inria started when I participated in the team of the BR-FR collaboration project named Ecobase in 1999. I sent my PhD student Maria Claudia Reis to do a doctoral internship at Inria-Rocquencourt and through her thesis development, we started to collaborate. This Ecobase successful collaboration was followed by a more focused project where I led the Brazilian side with the Kiwi Project from 2001-2003 with Patrick Valduriez leading the French side at Inria-Rocquencourt, at that time. Then, I continued leading Brazilian teams in additional six bilateral funded projects (CAPES, CNPq and FAPERJ from Brazil), all in collaboration with Patrick Valduriez and Esther Pacitti among other researchers. These projects were: with Inria-Rennes at Nantes (Atlas Team): DaaD [2004-2007], then GridData [2007-2008]. The collaboration continued in the following years, where I coordinated the Brazilian side with Inria-Sophia at Montpellier (Zenith Team): Datluge [2009-2010], then three Associate Team projects Sarava [2010-2012], followed by SwfP2Pcloud [2012-2014], then, SciDisc [2017-2019] funded unilaterally by Inria.
During the years of 2012-2018, I was part of the Brazilian team in the following projects that also involved the Inria Zenith Team: Music (Associate Team); Hoscar (Inria-Brésil); HPC4e (EU-Brazil). And currently I am part of the HPDaSc Associate Team project also with Inria Zenith, led by Fabio Porto from LNCC, Brazil.
Can you tell us more about the current Associate Team HPDaSC ? Is it a continuation of the previous collaborations between Inria, UFRJ and LNCC? (SciDISC, MUSIC, …)
In fact, I am very happy to see that an initiative that started in Brazil with PUC-Rio and COPPE/UFRJ back in 2001 have led to seven successful joint projects with Inria, coordinated by COPPE/UFRJ, including three Associate Teams. In the last ten years, these joint projects have been expanding to include other institutions like CEFET-RJ, LNCC, and UFF, as their database groups started to be created. HPDaSC, led by Fabio Porto at LNCC, is a fifth edition of the Inria’s Associate Team initiative between Brazil and Atlas/Zenith, preceded by Sarava, SwfP2Pcloud, Music and SciDISC. HPDaSC continues with the high-performance scientific data management topic, but HPDaSC addresses exciting new research challenges on data science, particularly deep learning. HPDaSC assembles a new generation of talented researchers and students from both countries and its collaboration is already quite productive.
What do you think this collaboration has brought to you? And to COPPE/UFRJ?
COPPE/UFRJ has a long tradition of collaboration with INRIA, starting back in early 1980s with Prof. Claude Marechal from Inria and our Prof. Paulo Roberto Oliveira, in addition to other collaborations with our Emeritus Prof. Nelson Maculan. Recently, Prof. Daniel Figueiredo coordinated the Thanes (https://team.inria.fr/thanes/) Associate Team.
I particularly like to do collaborative research, benefiting from different areas of expertise. The collaboration with Inria, through Patrick Valduriez’s leadership has broaden the exchange of ideas with hard work from both sides involving many internationally well-known researchers. Joint supervisions allowed COPPE/UFRJ students to work with leading scientists on specific domains, in addition to providing interaction among students from different parts of the world, while COPPE/UFRJ’s students visited Patrick’s group at Inria. Having access to Inria’s Grid 5000 high performance machines has also allowed us to do large scale evaluations of our research. The mobility provided by these project’s funds have fostered the Brazilian participation in top quality conferences and talks combined to our trips to Inria in France.
Just as an example, Prof. Eduardo Ogasawara, who currently leads the CEFET-RJ group in HPDasC Associate Team, is a former PhD student from COPPE/UFRJ, supervised by Patrick Valduriez and myself under the Associate Team Sarava project. Our project won the Best Paper Award in 2009 at the Colibri Colloquium “COLIBRI – Colóquio em Informática: Brasil/INRIA, Cooperações, Avanços e Desafios” collocated with the 30th Brazilian Computer Science Conference (CBC-SBC). Colibri was part of the “Year of France in Brazil” celebrations.
Overall, these fruitful 20 years of collaboration between our group and Inria Zenith’s group have produced important results in exciting research challenges in data management with high performance computing. These project topics attract bright PhD students to COPPE/UFRJ, three of these, who did internships at Inria, won best thesis awards in the prestigious Brazilian database conference (SBBD). Collaboratively, with our jointly supervised students, our group at COPPE contributed with Zenith in more than 20 journal papers and 20 conference papers, in addition to several open source software prototypes validated with real problems in different scientific domains. Having these joint papers publicly available at Inria’s HAL open archive have definitely helped to disseminate our joint work.
And finally, what would be the next step of this collaboration, from your point of view?
The advantages of working with the state-of-the-art topics that are closely related to real problems is that there are always research challenges with real large-scale data to be addressed. Since the database groups from the two countries have expanded with a balance between young and senior researchers, I continue to foresee a bright scientific future ahead of this collaboration. From my point of view, the next steps are towards addressing challenges with the new generation of heterogeneous high performance computing environments to efficiently process and analyze data combined with machine learning advances. Once again, having supercomputers at COPPE, LNCC and Inria provides us the high-performance infrastructure to work on these topics.