Digital Life Project: Autonomous 3D Characters with Social Intelligence

Zhongang Cai; Jianping Jiang; Zhongfei Qing; Xinying Guo; Mingyuan Zhang; Zhengyu Lin; Haiyi Mei; Chen Wei; Ruisi Wang; Wanqi Yin; Liang Pan; Xiangyu Fan; Han Du; Peng Gao; Zhitao Yang; Yang Gao; Jiaqi Li; Tianxiang Ren; Yukun Wei; Xiaogang Wang; Chen Change Loy; Lei Yang; Ziwei Liu

doi:10.1109/CVPR52733.2024.00062

Digital Life Project: Autonomous 3D Characters with Social Intelligence

Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Liang Pan, Xiangyu Fan, Han Du, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang WangChen Change Loy, Lei Yang^*, Ziwei Liu^*

^*Corresponding author for this work

Research output: Contribution to journal › Conference article › peer-review

13 Citations (Scopus)

Abstract

In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models personalities with systematic few-shot exemplars, incorporates a reflection process based on psychology principles, and emulates autonomy by initiating dialogue topics; 2) MoMat-MoGen: a text-driven motion synthesis paradigm for controlling the character's digital body. It integrates motion matching, a proven industry technique to ensure motion quality, with cutting-edge advancements in motion generation for diversity. Extensive experiments demonstrate that each module achieves state-of-The-art performance in its respective domain. Collectively, they enable virtual characters to initiate and sustain dialogues autonomously, while evolving their socio-psychological states. Concurrently, these characters can perform contextually relevant bodily movements. Additionally, an extension of DLP enables a virtual character to recognize and appropriately respond to human players' actions.

Original language	English
Pages (from-to)	582-592
Number of pages	11
Journal	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs	https://doi.org/10.1109/CVPR52733.2024.00062
Publication status	Published - 2024
Externally published	Yes
Event	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States Duration: Jun 16 2024 → Jun 22 2024

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

ASJC Scopus Subject Areas

Software
Computer Vision and Pattern Recognition

Keywords

Agents
Embodied AI
Large Language Models
Motion Generation

Access to Document

10.1109/CVPR52733.2024.00062

Cite this

Cai, Z., Jiang, J., Qing, Z., Guo, X., Zhang, M., Lin, Z., Mei, H., Wei, C., Wang, R., Yin, W., Pan, L., Fan, X., Du, H., Gao, P., Yang, Z., Gao, Y., Li, J., Ren, T., Wei, Y., ... Liu, Z. (2024). Digital Life Project: Autonomous 3D Characters with Social Intelligence. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 582-592. https://doi.org/10.1109/CVPR52733.2024.00062

@article{a9efd23d088b4e838d9777a77d099ff9,

title = "Digital Life Project: Autonomous 3D Characters with Social Intelligence",

abstract = "In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models personalities with systematic few-shot exemplars, incorporates a reflection process based on psychology principles, and emulates autonomy by initiating dialogue topics; 2) MoMat-MoGen: a text-driven motion synthesis paradigm for controlling the character's digital body. It integrates motion matching, a proven industry technique to ensure motion quality, with cutting-edge advancements in motion generation for diversity. Extensive experiments demonstrate that each module achieves state-of-The-art performance in its respective domain. Collectively, they enable virtual characters to initiate and sustain dialogues autonomously, while evolving their socio-psychological states. Concurrently, these characters can perform contextually relevant bodily movements. Additionally, an extension of DLP enables a virtual character to recognize and appropriately respond to human players' actions.",

keywords = "Agents, Embodied AI, Large Language Models, Motion Generation",

author = "Zhongang Cai and Jianping Jiang and Zhongfei Qing and Xinying Guo and Mingyuan Zhang and Zhengyu Lin and Haiyi Mei and Chen Wei and Ruisi Wang and Wanqi Yin and Liang Pan and Xiangyu Fan and Han Du and Peng Gao and Zhitao Yang and Yang Gao and Jiaqi Li and Tianxiang Ren and Yukun Wei and Xiaogang Wang and Loy, \{Chen Change\} and Lei Yang and Ziwei Liu",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 ; Conference date: 16-06-2024 Through 22-06-2024",

year = "2024",

doi = "10.1109/CVPR52733.2024.00062",

language = "English",

pages = "582--592",

journal = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

issn = "1063-6919",

publisher = "IEEE Computer Society",

}

Cai, Z, Jiang, J, Qing, Z, Guo, X, Zhang, M, Lin, Z, Mei, H, Wei, C, Wang, R, Yin, W, Pan, L, Fan, X, Du, H, Gao, P, Yang, Z, Gao, Y, Li, J, Ren, T, Wei, Y, Wang, X, Loy, CC, Yang, L & Liu, Z 2024, 'Digital Life Project: Autonomous 3D Characters with Social Intelligence', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 582-592. https://doi.org/10.1109/CVPR52733.2024.00062

TY - JOUR

T1 - Digital Life Project

T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

AU - Cai, Zhongang

AU - Jiang, Jianping

AU - Qing, Zhongfei

AU - Guo, Xinying

AU - Zhang, Mingyuan

AU - Lin, Zhengyu

AU - Mei, Haiyi

AU - Wei, Chen

AU - Wang, Ruisi

AU - Yin, Wanqi

AU - Pan, Liang

AU - Fan, Xiangyu

AU - Du, Han

AU - Gao, Peng

AU - Yang, Zhitao

AU - Gao, Yang

AU - Li, Jiaqi

AU - Ren, Tianxiang

AU - Wei, Yukun

AU - Wang, Xiaogang

AU - Loy, Chen Change

AU - Yang, Lei

AU - Liu, Ziwei

PY - 2024

Y1 - 2024

N2 - In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models personalities with systematic few-shot exemplars, incorporates a reflection process based on psychology principles, and emulates autonomy by initiating dialogue topics; 2) MoMat-MoGen: a text-driven motion synthesis paradigm for controlling the character's digital body. It integrates motion matching, a proven industry technique to ensure motion quality, with cutting-edge advancements in motion generation for diversity. Extensive experiments demonstrate that each module achieves state-of-The-art performance in its respective domain. Collectively, they enable virtual characters to initiate and sustain dialogues autonomously, while evolving their socio-psychological states. Concurrently, these characters can perform contextually relevant bodily movements. Additionally, an extension of DLP enables a virtual character to recognize and appropriately respond to human players' actions.

AB - In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models personalities with systematic few-shot exemplars, incorporates a reflection process based on psychology principles, and emulates autonomy by initiating dialogue topics; 2) MoMat-MoGen: a text-driven motion synthesis paradigm for controlling the character's digital body. It integrates motion matching, a proven industry technique to ensure motion quality, with cutting-edge advancements in motion generation for diversity. Extensive experiments demonstrate that each module achieves state-of-The-art performance in its respective domain. Collectively, they enable virtual characters to initiate and sustain dialogues autonomously, while evolving their socio-psychological states. Concurrently, these characters can perform contextually relevant bodily movements. Additionally, an extension of DLP enables a virtual character to recognize and appropriately respond to human players' actions.

KW - Agents

KW - Embodied AI

KW - Large Language Models

KW - Motion Generation

UR - http://www.scopus.com/inward/record.url?scp=85202364996&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85202364996&partnerID=8YFLogxK

U2 - 10.1109/CVPR52733.2024.00062

DO - 10.1109/CVPR52733.2024.00062

M3 - Conference article

AN - SCOPUS:85202364996

SN - 1063-6919

SP - 582

EP - 592

JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Y2 - 16 June 2024 through 22 June 2024

ER -

Digital Life Project: Autonomous 3D Characters with Social Intelligence

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Cite this