PhD defense of Mateusz Budnik – Friday 24 February 2017

================================================================

Dear all,

I am pleased to invite you to my thesis defense, entitled Active and Deep Learning for Multimedia, supervised by Laurent Besacier and Georges Quénot.

This will be held on Friday 24 February 2017 at 2:30 pm in the auditorium of the IMAG building, on the ground floor of 700 avenue Centrale, 38400 University of Saint-Martin-d’Hères. It will be followed by a snack to which you are also invited.

The presentation will be in English.

Abstract

The main topics addressed in this thesis are the use of active learning and deep learning methods in the context of retrieval of multimodal document processing. The contributions proposed in this thesis address both these topics. An active learning framework was introduced for allowing for a more efficient annotation of broadcast TV videos thanks to the propagation of labels, to the use of multimodal data and to effective selection strategies. Several scenarios and experiments were considered in the context of person identification in videos, taking into account the use of different modalities (such as faces, speech segments and overlaid text) and different selection strategies. The whole system was additionally validated in a dry run test involving real human annotators.

A second major contribution was the investigation and use of deep learning (in particular the convolutional neural network) for video information retrieval. A comprehensive study was made using different neural network architectures and different training techniques such as fine-tuning or more classical classifiers like SVM. A comparison was made between learned features (the output of neural networks) and engineered features. Despite the lower performance of the latter, a fusion of these two types of features increases overall performance.

Finally, the use of convolutional neural network for speaker identification using spectrograms is explored. The results have been compared to those obtained with other state-of-the-art speaker identification systems. Different fusion approaches were also tested. The proposed approach obtained results comparable to those of some of the other tested approaches and offered an increase in performance when fused with the output of the best system.