Representation learning through cross-modality supervision

Deen Dayal Mohan, Nishant Sankaran, Dennis Fedorishin, Srirangaraj Setlur, Venu Govindaraju;

May, 2019

Abstract

Learning robust representations for applications with multiple modalities of input can have a significant impact on its performance. Traditional representation learning methods rely on projecting the input modalities to a common subspace to maximize agreement amongst the modalities for a particular task. We propose a novel approach to representation learning that uses a latent representation decoder to reconstruct the target modality and thereby employs the target modality purely as a supervision signal for discovering correlations between the modalities. Through cross modality supervision, we demonstrate that the learnt representation is able to improve the performance of the task of facial action unit (AU) recognition when compared with the modality specific representations and even their fused counterparts. Our experiments on three AU recognition datasets - MMSE, BP4D and DISFA, show strong performance gains producing state-of-the-art results in spite of the absence of a modality..

Type

Conference paper

Publication

In IEEE International Conference on Automatic Face & Gesture Recognition, 2019

Deen Dayal Mohan

PhD student, Department of Computer Science

My research interests include computer vision, multimodal representation learning and biometric.