Title: Face processing for visual- and audio-visual speech
Abstract: It has long been established that visual perception plays a primordial role in speech communication. In particular, vision provides an alternative representation of some of the information that is present in the audio, with the advantage that it is affected neither by acoustic noise nor by competing audio sources. The most prominent visual features used are facial movements. Facial movements are a combination of rigid head motions and non-rigid facial deformations. On one side, head movements play linguistic functions as they mark the structure of the ongoing discourse and are used to regulate interaction. On the other side, lip and jaw movements are generated by facial muscles which, in turn, are controlled by speech production — they are correlated with phonemes and with word pronunciation. In this talk we will address the problem of how to estimate head movements in order to remove them and to synthesize a frontal and steady video of a face. We provide a practical solution based on robust estimation of the rigid head motion and we show that the frontalized output is well suited to be incorporated into an audio-visual speech enhancement pipeline.