Gylfi received his Phd. from Rennes 1 University in 2013. The topic of his thesis was “Parallelism and Distribution for Very Large Scale Content Based Image Retrieval”. During this time, he worked with Hadoop and took part in experiments that involved hundreds of machines from Grid’ 5000 and as many as 100 million images (tens of billions of high-dimensional features).
Since September 2014, Gylfi is the recipient of an Inria@SiliconValley post-doc grant and of a grant from the AMPLab of UC Berkeley, which allows him to contribute to a research collaboration between the LINKMEDIA Inria project-team and the AMPLab.
Harnessing The Clouds to do Content-Based Image Retrieval
Content-based image retrieval, or CBIR, is still a very active field of research. One significant reason for this high level of interest is the spectacular growth and availability of digital media. Datasets today are much larger than could even be imagined only a few years ago and researchers have their hands full trying to keep up.
At the web-scale, today typically defined as collections consisting of hundreds of millions or even billions of images, there is clearly an overlap with another very active field of research, namely Big Data. It is this observation that has led to the cooperation between the LinkMEDIA team of INRIA in Rennes and the AMPLab of the University of California, Berkeley.
The LinkMEDIA team focuses on a wide range of research on multimedia related topics. What LinkMEDIA bring to this cooperation is expertise accumulated over many years of research on content based multimedia search.
AMPLab on the other hand is focused on dealing with large scale problems and coming up with simple solutions to Big Data problems. AMPLab is the birth place of SPARK, a framework for “lightning-fast cluster computing”, or basically the biggest thing in Big Data since Hadoop Map-Reduce.
Although development of Spark has shifted over to the open source community, AMPLab is still a major contributor in its evolution.1
The long term goal of the cooperation between LinkMEDIA team and AMPLab is to leverage on what big-data oriented platforms, like Spark, can provide for next generation web-scale CBIR systems and potentially how such platforms can be enhanced to better accommodate multimedia applications.
In the short term, the goal is to implement an archetypal CBIR system for Spark that has previously been successfully adapted and run in Hadoop. Not only will this allow a comparison of two different frameworks but it will also be a vital building block in realizing other MM applications at large scale