Talk-to-Edit: Fine-Grained Facial Editing via Dialog

Yuming Jiang; Ziqi Huang; Xingang Pan; Chen Change Loy; Ziwei Liu

doi:10.1109/ICCV48922.2021.01354

Talk-to-Edit: Fine-Grained Facial Editing via Dialog

Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu^*

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

99 Citations (Scopus)

Abstract

Facial editing is an important task in vision and graphics with numerous applications. However, existing works are incapable to deliver a continuous and fine-grained editing mode (e.g., editing a slightly smiling face to a big laughing one) with natural interactions with users. In this work, we propose Talk-to-Edit, an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system. Our key insight is to model a continual “semantic field” in the GAN latent space. 1) Unlike previous works that regard the editing as traversing straight lines in the latent space, here the fine-grained editing is formulated as finding a curving trajectory that respects fine-grained attribute landscape on the semantic field. 2) The curvature at each step is location-specific and determined by the input image as well as the users' language requests. 3) To engage the users in a meaningful dialog, our system generates language feedback by considering both the user request and the current state of the semantic field. We also contribute CelebA-Dialog, a visual-language facial editing dataset to facilitate large-scale study. Specifically, each image has manually annotated fine-grained attribute annotations as well as template-based textual descriptions in natural language. Extensive quantitative and qualitative experiments demonstrate the superiority of our framework in terms of 1) the smoothness of fine-grained editing, 2) the identity/attribute preservation, and 3) the visual photorealism and dialog fluency. Notably, user study validates that our overall system is consistently favored by around 80% of the participants. Our project page is https://www.mmlab-ntu.com/project/talkedit/.

Original language	English
Title of host publication	Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	13779-13788
Number of pages	10
ISBN (Electronic)	9781665428125
DOIs	https://doi.org/10.1109/ICCV48922.2021.01354
Publication status	Published - 2021
Externally published	Yes
Event	18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 - Virtual, Online, Canada Duration: Oct 11 2021 → Oct 17 2021

Publication series

Name	Proceedings of the IEEE International Conference on Computer Vision
ISSN (Print)	1550-5499

Conference

Conference	18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
Country/Territory	Canada
City	Virtual, Online
Period	10/11/21 → 10/17/21

Bibliographical note

Publisher Copyright:
© 2021 IEEE

ASJC Scopus Subject Areas

Software
Computer Vision and Pattern Recognition

Access to Document

10.1109/ICCV48922.2021.01354

Cite this

Jiang, Y., Huang, Z., Pan, X., Loy, C. C., & Liu, Z. (2021). Talk-to-Edit: Fine-Grained Facial Editing via Dialog. In Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021 (pp. 13779-13788). (Proceedings of the IEEE International Conference on Computer Vision). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV48922.2021.01354

@inproceedings{c5ae980bd2a445778d4f8b61f27e93f0,

title = "Talk-to-Edit: Fine-Grained Facial Editing via Dialog",

abstract = "Facial editing is an important task in vision and graphics with numerous applications. However, existing works are incapable to deliver a continuous and fine-grained editing mode (e.g., editing a slightly smiling face to a big laughing one) with natural interactions with users. In this work, we propose Talk-to-Edit, an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system. Our key insight is to model a continual “semantic field” in the GAN latent space. 1) Unlike previous works that regard the editing as traversing straight lines in the latent space, here the fine-grained editing is formulated as finding a curving trajectory that respects fine-grained attribute landscape on the semantic field. 2) The curvature at each step is location-specific and determined by the input image as well as the users' language requests. 3) To engage the users in a meaningful dialog, our system generates language feedback by considering both the user request and the current state of the semantic field. We also contribute CelebA-Dialog, a visual-language facial editing dataset to facilitate large-scale study. Specifically, each image has manually annotated fine-grained attribute annotations as well as template-based textual descriptions in natural language. Extensive quantitative and qualitative experiments demonstrate the superiority of our framework in terms of 1) the smoothness of fine-grained editing, 2) the identity/attribute preservation, and 3) the visual photorealism and dialog fluency. Notably, user study validates that our overall system is consistently favored by around 80\% of the participants. Our project page is https://www.mmlab-ntu.com/project/talkedit/.",

author = "Yuming Jiang and Ziqi Huang and Xingang Pan and Loy, \{Chen Change\} and Ziwei Liu",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE; 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 ; Conference date: 11-10-2021 Through 17-10-2021",

year = "2021",

doi = "10.1109/ICCV48922.2021.01354",

language = "English",

series = "Proceedings of the IEEE International Conference on Computer Vision",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "13779--13788",

booktitle = "Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021",

address = "United States",

}

Jiang, Y, Huang, Z, Pan, X, Loy, CC & Liu, Z 2021, Talk-to-Edit: Fine-Grained Facial Editing via Dialog. in Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021. Proceedings of the IEEE International Conference on Computer Vision, Institute of Electrical and Electronics Engineers Inc., pp. 13779-13788, 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021, Virtual, Online, Canada, 10/11/21. https://doi.org/10.1109/ICCV48922.2021.01354

Talk-to-Edit: Fine-Grained Facial Editing via Dialog. / Jiang, Yuming; Huang, Ziqi; Pan, Xingang et al.
Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021. Institute of Electrical and Electronics Engineers Inc., 2021. p. 13779-13788 (Proceedings of the IEEE International Conference on Computer Vision).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Talk-to-Edit

T2 - 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021

AU - Jiang, Yuming

AU - Huang, Ziqi

AU - Pan, Xingang

AU - Loy, Chen Change

AU - Liu, Ziwei

PY - 2021

Y1 - 2021

N2 - Facial editing is an important task in vision and graphics with numerous applications. However, existing works are incapable to deliver a continuous and fine-grained editing mode (e.g., editing a slightly smiling face to a big laughing one) with natural interactions with users. In this work, we propose Talk-to-Edit, an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system. Our key insight is to model a continual “semantic field” in the GAN latent space. 1) Unlike previous works that regard the editing as traversing straight lines in the latent space, here the fine-grained editing is formulated as finding a curving trajectory that respects fine-grained attribute landscape on the semantic field. 2) The curvature at each step is location-specific and determined by the input image as well as the users' language requests. 3) To engage the users in a meaningful dialog, our system generates language feedback by considering both the user request and the current state of the semantic field. We also contribute CelebA-Dialog, a visual-language facial editing dataset to facilitate large-scale study. Specifically, each image has manually annotated fine-grained attribute annotations as well as template-based textual descriptions in natural language. Extensive quantitative and qualitative experiments demonstrate the superiority of our framework in terms of 1) the smoothness of fine-grained editing, 2) the identity/attribute preservation, and 3) the visual photorealism and dialog fluency. Notably, user study validates that our overall system is consistently favored by around 80% of the participants. Our project page is https://www.mmlab-ntu.com/project/talkedit/.

AB - Facial editing is an important task in vision and graphics with numerous applications. However, existing works are incapable to deliver a continuous and fine-grained editing mode (e.g., editing a slightly smiling face to a big laughing one) with natural interactions with users. In this work, we propose Talk-to-Edit, an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system. Our key insight is to model a continual “semantic field” in the GAN latent space. 1) Unlike previous works that regard the editing as traversing straight lines in the latent space, here the fine-grained editing is formulated as finding a curving trajectory that respects fine-grained attribute landscape on the semantic field. 2) The curvature at each step is location-specific and determined by the input image as well as the users' language requests. 3) To engage the users in a meaningful dialog, our system generates language feedback by considering both the user request and the current state of the semantic field. We also contribute CelebA-Dialog, a visual-language facial editing dataset to facilitate large-scale study. Specifically, each image has manually annotated fine-grained attribute annotations as well as template-based textual descriptions in natural language. Extensive quantitative and qualitative experiments demonstrate the superiority of our framework in terms of 1) the smoothness of fine-grained editing, 2) the identity/attribute preservation, and 3) the visual photorealism and dialog fluency. Notably, user study validates that our overall system is consistently favored by around 80% of the participants. Our project page is https://www.mmlab-ntu.com/project/talkedit/.

UR - http://www.scopus.com/inward/record.url?scp=85121415696&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85121415696&partnerID=8YFLogxK

U2 - 10.1109/ICCV48922.2021.01354

DO - 10.1109/ICCV48922.2021.01354

M3 - Conference contribution

AN - SCOPUS:85121415696

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 13779

EP - 13788

BT - Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 11 October 2021 through 17 October 2021

ER -

Talk-to-Edit: Fine-Grained Facial Editing via Dialog

Abstract

Publication series

Conference

Bibliographical note

ASJC Scopus Subject Areas

Access to Document

Other files and links

Cite this