Bailando++: 3D Dance GPT With Choreographic Memory

Li Siyao; Weijiang Yu; Tianpei Gu; Chunze Lin; Quan Wang; Chen Qian; Chen Change Loy; Ziwei Liu

doi:10.1109/TPAMI.2023.3319435

Bailando++: 3D Dance GPT With Choreographic Memory

Li Siyao^*, Weijiang Yu, Tianpei Gu, Chunze Lin, Quan Wang, Chen Qian, Chen Change Loy, Ziwei Liu

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

15 Citations (Scopus)

Abstract

Our proposed music-to-dance framework, Bailando++, addresses the challenges of driving 3D characters to dance in a way that follows the constraints of choreography norms and maintains temporal coherency with different music genres. Bailando++ consists of two components: a choreographic memory that learns to summarize meaningful dancing units from 3D pose sequences, and an actor-critic Generative Pre-trained Transformer (GPT) that composes these units into a fluent dance coherent to the music. In particular, to synchronize the diverse motion tempos and music beats, we introduce an actor-critic-based reinforcement learning scheme to the GPT with a novel beat-align reward function. Additionally, we consider learning human dance poses in the rotation domain to avoid body distortions incompatible with human morphology, and introduce a musical contextual encoding to allow the motion GPT to grasp longer-term patterns of music. Our experiments on the standard benchmark show that Bailando++ achieves state-of-the-art performance both qualitatively and quantitatively, with the added benefit of the unsupervised discovery of human-interpretable dancing-style poses in the choreographic memory.

Original language	English
Pages (from-to)	14192-14207
Number of pages	16
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	45
Issue number	12
DOIs	https://doi.org/10.1109/TPAMI.2023.3319435
Publication status	Published - Dec 1 2023
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 1979-2012 IEEE.

ASJC Scopus Subject Areas

Software
Computer Vision and Pattern Recognition
Computational Theory and Mathematics
Artificial Intelligence
Applied Mathematics

Keywords

3D human motion
dance generation
GPT
multi-modal
VQ-VAE

Access to Document

10.1109/TPAMI.2023.3319435

Cite this

@article{98a2aea29bc94bfb8936f6a8c4597d22,

title = "Bailando++: 3D Dance GPT With Choreographic Memory",

abstract = "Our proposed music-to-dance framework, Bailando++, addresses the challenges of driving 3D characters to dance in a way that follows the constraints of choreography norms and maintains temporal coherency with different music genres. Bailando++ consists of two components: a choreographic memory that learns to summarize meaningful dancing units from 3D pose sequences, and an actor-critic Generative Pre-trained Transformer (GPT) that composes these units into a fluent dance coherent to the music. In particular, to synchronize the diverse motion tempos and music beats, we introduce an actor-critic-based reinforcement learning scheme to the GPT with a novel beat-align reward function. Additionally, we consider learning human dance poses in the rotation domain to avoid body distortions incompatible with human morphology, and introduce a musical contextual encoding to allow the motion GPT to grasp longer-term patterns of music. Our experiments on the standard benchmark show that Bailando++ achieves state-of-the-art performance both qualitatively and quantitatively, with the added benefit of the unsupervised discovery of human-interpretable dancing-style poses in the choreographic memory.",

keywords = "3D human motion, dance generation, GPT, multi-modal, VQ-VAE",

author = "Li Siyao and Weijiang Yu and Tianpei Gu and Chunze Lin and Quan Wang and Chen Qian and Loy, \{Chen Change\} and Ziwei Liu",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2023",

month = dec,

day = "1",

doi = "10.1109/TPAMI.2023.3319435",

language = "English",

volume = "45",

pages = "14192--14207",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "12",

}

TY - JOUR

T1 - Bailando++

T2 - 3D Dance GPT With Choreographic Memory

AU - Siyao, Li

AU - Yu, Weijiang

AU - Gu, Tianpei

AU - Lin, Chunze

AU - Wang, Quan

AU - Qian, Chen

AU - Loy, Chen Change

AU - Liu, Ziwei

PY - 2023/12/1

Y1 - 2023/12/1

N2 - Our proposed music-to-dance framework, Bailando++, addresses the challenges of driving 3D characters to dance in a way that follows the constraints of choreography norms and maintains temporal coherency with different music genres. Bailando++ consists of two components: a choreographic memory that learns to summarize meaningful dancing units from 3D pose sequences, and an actor-critic Generative Pre-trained Transformer (GPT) that composes these units into a fluent dance coherent to the music. In particular, to synchronize the diverse motion tempos and music beats, we introduce an actor-critic-based reinforcement learning scheme to the GPT with a novel beat-align reward function. Additionally, we consider learning human dance poses in the rotation domain to avoid body distortions incompatible with human morphology, and introduce a musical contextual encoding to allow the motion GPT to grasp longer-term patterns of music. Our experiments on the standard benchmark show that Bailando++ achieves state-of-the-art performance both qualitatively and quantitatively, with the added benefit of the unsupervised discovery of human-interpretable dancing-style poses in the choreographic memory.

AB - Our proposed music-to-dance framework, Bailando++, addresses the challenges of driving 3D characters to dance in a way that follows the constraints of choreography norms and maintains temporal coherency with different music genres. Bailando++ consists of two components: a choreographic memory that learns to summarize meaningful dancing units from 3D pose sequences, and an actor-critic Generative Pre-trained Transformer (GPT) that composes these units into a fluent dance coherent to the music. In particular, to synchronize the diverse motion tempos and music beats, we introduce an actor-critic-based reinforcement learning scheme to the GPT with a novel beat-align reward function. Additionally, we consider learning human dance poses in the rotation domain to avoid body distortions incompatible with human morphology, and introduce a musical contextual encoding to allow the motion GPT to grasp longer-term patterns of music. Our experiments on the standard benchmark show that Bailando++ achieves state-of-the-art performance both qualitatively and quantitatively, with the added benefit of the unsupervised discovery of human-interpretable dancing-style poses in the choreographic memory.

KW - 3D human motion

KW - dance generation

KW - GPT

KW - multi-modal

KW - VQ-VAE

UR - http://www.scopus.com/inward/record.url?scp=85173064445&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85173064445&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2023.3319435

DO - 10.1109/TPAMI.2023.3319435

M3 - Article

C2 - 37751342

AN - SCOPUS:85173064445

SN - 0162-8828

VL - 45

SP - 14192

EP - 14207

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 12

ER -

Bailando++: 3D Dance GPT With Choreographic Memory

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Cite this