ㆍ발행기관 : 중앙대학교 인문콘텐츠연구소ㆍ수록지정보 : 인공지능인문학연구 / 2권 ㆍ저자명 : ( Miriam Cha ) , ( Youngjune L. Gwon ) , ( H. T. Kung )
Sparse coding has been applied successfully to single-modality scenarios. We consider a sparse coding framework for multimodal representation learning. Our framework aims to capture semantic correlation between different data types via joint sparse coding. Such joint optimization induces a unified representation that is sparse and shared across modalities. In particular, we compute joint, cross-modal, and stacked cross-modal sparse codes. We find that these representations are robust to noise and provide greater flexibility in modeling features for multimodal input. A good multimodal framework should be able to fill in missing modality given the other and improve representational efficiency. We demonstrate missing modality case through image denoising and indicate effectiveness of cross-modal sparse code in uncovering the relation of the clean-corrupted image pairs. Furthermore, we experiment with multi-layer sparse coding to learn highly nonlinear relationship. The effectiveness of our approach is also demonstrated in the multimedia event detection and retrieval on the TRECVID dataset (audio-video), category classification on the Wikipedia dataset (image-text), and sentiment classification on PhotoTweet (image-text).