Domain adaptive representation learning for facial action unit recognition

Abstract

Learning robust representations for applications with multiple modalities of input can have a significant impact on improving performance. Traditional representation learning methods rely on projecting the input modalities to a common subspace to maximize agreement amongst the modalities for a particular task. We propose a novel approach to representation learning that uses a latent representation decoder to reconstruct the target modality and thereby employ the target modality purely as a supervision signal for discovering correlations between the modalities. Through cross modality supervision, we demonstrate that the learnt representation is able to improve upon the performance of the task of facial action unit (AU) recognition over modality specific representations and even their fused counterparts. As an extension, we explore a new transfer learning technique to adapt the learnt representation to the target domain. We also present a shared representation based feature fusion methodology to improve the performance of any multi-modal system. Our experiments on three AU recognition datasets - MMSE, BP4D and DISFA, show strong performance gains producing state-of-the-art results in spite of the absence of data from a modality

Publication
In Pattern Recognition Journal. 2020 June
Deen Dayal Mohan
Deen Dayal Mohan
PhD student, Department of Computer Science

My research interests include computer vision, multimodal representation learning and biometric.