Learning to Disambiguate by Asking Discriminative Questions

Yining Li; Chen Huang; Xiaoou Tang; Chen Change Loy

doi:10.1109/ICCV.2017.370

Learning to Disambiguate by Asking Discriminative Questions

Yining Li, Chen Huang, Xiaoou Tang, Chen Change Loy

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

8 Citations (Scopus)

Abstract

The ability to ask questions is a powerful tool to gather information in order to learn about the world and resolve ambiguities. In this paper, we explore a novel problem of generating discriminative questions to help disambiguate visual instances. Our work can be seen as a complement and new extension to the rich research studies on image captioning and question answering. We introduce the first large-scale dataset with over 10,000 carefully annotated images-question tuples to facilitate benchmarking. In particular, each tuple consists of a pair of images and 4.6 discriminative questions (as positive samples) and 5.9 non-discriminative questions (as negative samples) on average. In addition, we present an effective method for visual discriminative question generation. The method can be trained in a weakly supervised manner without discriminative images-question tuples but just existing visual question answering datasets. Promising results are shown against representative baselines through quantitative evaluations and user studies.

Original language	English
Title of host publication	Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	3439-3448
Number of pages	10
ISBN (Electronic)	9781538610329
DOIs	https://doi.org/10.1109/ICCV.2017.370
Publication status	Published - Dec 22 2017
Externally published	Yes
Event	16th IEEE International Conference on Computer Vision, ICCV 2017 - Venice, Italy Duration: Oct 22 2017 → Oct 29 2017

Publication series

Name	Proceedings of the IEEE International Conference on Computer Vision
Volume	2017-October
ISSN (Print)	1550-5499

Conference

Conference	16th IEEE International Conference on Computer Vision, ICCV 2017
Country/Territory	Italy
City	Venice
Period	10/22/17 → 10/29/17

Bibliographical note

Publisher Copyright:
© 2017 IEEE.

ASJC Scopus Subject Areas

Software
Computer Vision and Pattern Recognition

Access to Document

10.1109/ICCV.2017.370

Cite this

Li, Y., Huang, C., Tang, X., & Loy, C. C. (2017). Learning to Disambiguate by Asking Discriminative Questions. In Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017 (pp. 3439-3448). Article 8237632 (Proceedings of the IEEE International Conference on Computer Vision; Vol. 2017-October). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV.2017.370

@inproceedings{5c2fe8351274427aac10afc4a333a1d4,

title = "Learning to Disambiguate by Asking Discriminative Questions",

abstract = "The ability to ask questions is a powerful tool to gather information in order to learn about the world and resolve ambiguities. In this paper, we explore a novel problem of generating discriminative questions to help disambiguate visual instances. Our work can be seen as a complement and new extension to the rich research studies on image captioning and question answering. We introduce the first large-scale dataset with over 10,000 carefully annotated images-question tuples to facilitate benchmarking. In particular, each tuple consists of a pair of images and 4.6 discriminative questions (as positive samples) and 5.9 non-discriminative questions (as negative samples) on average. In addition, we present an effective method for visual discriminative question generation. The method can be trained in a weakly supervised manner without discriminative images-question tuples but just existing visual question answering datasets. Promising results are shown against representative baselines through quantitative evaluations and user studies.",

author = "Yining Li and Chen Huang and Xiaoou Tang and Loy, \{Chen Change\}",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.; 16th IEEE International Conference on Computer Vision, ICCV 2017 ; Conference date: 22-10-2017 Through 29-10-2017",

year = "2017",

month = dec,

day = "22",

doi = "10.1109/ICCV.2017.370",

language = "English",

series = "Proceedings of the IEEE International Conference on Computer Vision",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "3439--3448",

booktitle = "Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017",

address = "United States",

}

Li, Y, Huang, C, Tang, X & Loy, CC 2017, Learning to Disambiguate by Asking Discriminative Questions. in Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017., 8237632, Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-October, Institute of Electrical and Electronics Engineers Inc., pp. 3439-3448, 16th IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 10/22/17. https://doi.org/10.1109/ICCV.2017.370

Learning to Disambiguate by Asking Discriminative Questions. / Li, Yining; Huang, Chen; Tang, Xiaoou et al.
Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 3439-3448 8237632 (Proceedings of the IEEE International Conference on Computer Vision; Vol. 2017-October).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Learning to Disambiguate by Asking Discriminative Questions

AU - Li, Yining

AU - Huang, Chen

AU - Tang, Xiaoou

AU - Loy, Chen Change

PY - 2017/12/22

Y1 - 2017/12/22

N2 - The ability to ask questions is a powerful tool to gather information in order to learn about the world and resolve ambiguities. In this paper, we explore a novel problem of generating discriminative questions to help disambiguate visual instances. Our work can be seen as a complement and new extension to the rich research studies on image captioning and question answering. We introduce the first large-scale dataset with over 10,000 carefully annotated images-question tuples to facilitate benchmarking. In particular, each tuple consists of a pair of images and 4.6 discriminative questions (as positive samples) and 5.9 non-discriminative questions (as negative samples) on average. In addition, we present an effective method for visual discriminative question generation. The method can be trained in a weakly supervised manner without discriminative images-question tuples but just existing visual question answering datasets. Promising results are shown against representative baselines through quantitative evaluations and user studies.

AB - The ability to ask questions is a powerful tool to gather information in order to learn about the world and resolve ambiguities. In this paper, we explore a novel problem of generating discriminative questions to help disambiguate visual instances. Our work can be seen as a complement and new extension to the rich research studies on image captioning and question answering. We introduce the first large-scale dataset with over 10,000 carefully annotated images-question tuples to facilitate benchmarking. In particular, each tuple consists of a pair of images and 4.6 discriminative questions (as positive samples) and 5.9 non-discriminative questions (as negative samples) on average. In addition, we present an effective method for visual discriminative question generation. The method can be trained in a weakly supervised manner without discriminative images-question tuples but just existing visual question answering datasets. Promising results are shown against representative baselines through quantitative evaluations and user studies.

UR - http://www.scopus.com/inward/record.url?scp=85041899528&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85041899528&partnerID=8YFLogxK

U2 - 10.1109/ICCV.2017.370

DO - 10.1109/ICCV.2017.370

M3 - Conference contribution

AN - SCOPUS:85041899528

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 3439

EP - 3448

BT - Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 16th IEEE International Conference on Computer Vision, ICCV 2017

Y2 - 22 October 2017 through 29 October 2017

ER -

Learning to Disambiguate by Asking Discriminative Questions

Abstract

Publication series

Conference

Bibliographical note

ASJC Scopus Subject Areas

Access to Document

Other files and links

Cite this