Abstract
The ability to ask questions is a powerful tool to gather information in order to learn about the world and resolve ambiguities. In this paper, we explore a novel problem of generating discriminative questions to help disambiguate visual instances. Our work can be seen as a complement and new extension to the rich research studies on image captioning and question answering. We introduce the first large-scale dataset with over 10,000 carefully annotated images-question tuples to facilitate benchmarking. In particular, each tuple consists of a pair of images and 4.6 discriminative questions (as positive samples) and 5.9 non-discriminative questions (as negative samples) on average. In addition, we present an effective method for visual discriminative question generation. The method can be trained in a weakly supervised manner without discriminative images-question tuples but just existing visual question answering datasets. Promising results are shown against representative baselines through quantitative evaluations and user studies.
Original language | English |
---|---|
Title of host publication | Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 3439-3448 |
Number of pages | 10 |
ISBN (Electronic) | 9781538610329 |
DOIs | |
Publication status | Published - Dec 22 2017 |
Externally published | Yes |
Event | 16th IEEE International Conference on Computer Vision, ICCV 2017 - Venice, Italy Duration: Oct 22 2017 → Oct 29 2017 |
Publication series
Name | Proceedings of the IEEE International Conference on Computer Vision |
---|---|
Volume | 2017-October |
ISSN (Print) | 1550-5499 |
Conference
Conference | 16th IEEE International Conference on Computer Vision, ICCV 2017 |
---|---|
Country/Territory | Italy |
City | Venice |
Period | 10/22/17 → 10/29/17 |
Bibliographical note
Publisher Copyright:© 2017 IEEE.
ASJC Scopus Subject Areas
- Software
- Computer Vision and Pattern Recognition