Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning

10/13/2015
by   Janarthanan Rajendran, et al.
0

Recently there has been a lot of interest in learning common representations for multiple views of data. Typically, such common representations are learned using a parallel corpus between the two views (say, 1M images and their English captions). In this work, we address a real-world scenario where no direct parallel data is available between two views of interest (say, V_1 and V_2) but parallel data is available between each of these views and a pivot view (V_3). We propose a model for learning a common representation for V_1, V_2 and V_3 using only the parallel data available between V_1V_3 and V_2V_3. The proposed model is generic and even works when there are n views of interest and only one pivot view which acts as a bridge between them. There are two specific downstream applications that we focus on (i) transfer learning between languages L_1,L_2,...,L_n using a pivot language L and (ii) cross modal access between images and a language L_1 using a pivot language L_2. Our model achieves state-of-the-art performance in multilingual document classification on the publicly available multilingual TED corpus and promising results in multilingual multimodal retrieval on a new dataset created and released as a part of this work.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro