Version vom 10. April 2018, 12:33 Uhr

Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data

Aditya Mogadala, Bhargav Kanuparthi, Achim Rettinger, York Sure-Vetter

Published: 2018 April

Buchtitel: The Web Conference (Cognitive Computing Track)
Verlag: ACM

Referierte Veröffentlichung

BibTeX

Kurzfassung
Growth of multimodal content on the web and social media has generated abundant weakly aligned image-sentence pairs. However, it is hard to interpret them directly due to intrinsic “intension”. In this paper, we aim to annotate such image-sentence pairs with connotations as labels to capture the intrinsic “intension”. We achieve it with a connotation multimodal embedding model (CMEM) using a novel loss function. It’s unique characteristics over previous models include: (i) the exploitation of multimodal data as opposed to only visual information, (ii) robustness to outlier labels in a multi-label scenario and (iii) works effectively with large-scale weakly supervised data. With extensive quantitative evaluation, we exhibit the effectiveness of CMEM for detection of multiple labels over other state-of-the-art approaches. Also, we show that in addition to annotation of image-sentence pairs with connotation labels, byproduct of our model inherently supports cross-modal retrieval i.e. image query - sentence retrieval.

Download: Media:Ctp147-mogadalaA.pdf

Forschungsgruppe

Web Science

Forschungsgebiet

Information Retrieval, Maschinelles Lernen, Künstliche Intelligenz, WWW Systeme

@@ Zeile 24: / Zeile 24: @@
 }}
 {{Publikation Details
-|Abstract=We address the task of labeling image-sentence pair at large-scale
+|Abstract=Growth of multimodal content on the web and social media has
-with varied concepts representing connotations. That is for any
+generated abundant weakly aligned image-sentence pairs. However, it is hard to interpret them directly due to intrinsic “intension”. In this paper, we aim to annotate such image-sentence pairs with connotations as labels to capture the intrinsic “intension”. We achieve it with a connotation multimodal embedding model (CMEM) using a novel loss function. It’s unique characteristics over previous models include: (i) the exploitation of multimodal data as opposed to only visual information, (ii) robustness to outlier labels in a multi-label scenario and (iii) works effectively with large-scale weakly supervised data. With extensive quantitative evaluation, we exhibit the effectiveness of CMEM for detection of multiple labels over other state-of-the-art approaches. Also, we show that in addition to annotation of image-sentence pairs with connotation labels, byproduct of our model inherently supports cross-modal retrieval i.e. image query - sentence retrieval.
-given query image-sentence, we aim to annotate them with the
+|Download=Ctp147-mogadalaA.pdf,
-connotations that capture intrinsic intension. To achieve it, we pro-
-pose a Connotation multimodal embedding model (CMEM) with a
-novel loss function. Its unique characteristics over previous models
-include (i) can leverage multimodal data as opposed to only visual
-information, (ii) robust to outlier labels in a multi-label scenario
-and (iii) works well with large-scale weakly supervised data. With
-extensive quantitative evaluation, we exhibit the effectiveness of
-CMEM for detection of multiple labels over other state-of-the-art
-approaches. Also, we show that in addition to annotation of images
-with connotation labels, our byproduct of the model inherently
-supports cross-modal retrieval.
-|Download=Ctp147-mogadalaA.pdf,
 |Forschungsgruppe=Web Science
 }}

Inproceedings3598: Unterschied zwischen den Versionen

Version vom 10. April 2018, 12:33 Uhr

Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data

Discovering Connotations as Labels for Weakly Supervised Image-Sentence Data