TransMoMO: Invariance-driven unsupervised video motion retargeting

Zhuoqian Yang; Wentao Zhu; Wenyan Wu; Chen Qian; Qiang Zhou; Bolei Zhou; Chen Change Loy

doi:10.1109/CVPR42600.2020.00535

TransMoMO: Invariance-driven unsupervised video motion retargeting

Zhuoqian Yang^*, Wentao Zhu, Wenyan Wu, Chen Qian, Qiang Zhou, Bolei Zhou, Chen Change Loy

^*Corresponding author for this work

Research output: Contribution to journal › Conference article › peer-review

52 Citations (Scopus)

Abstract

We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person (Fig. 1). Without using any paired data for supervision, the proposed method can be trained in an unsupervised manner by exploiting invariance properties of three orthogonal factors of variation including motion, structure, and view-angle. Specifically, with loss functions carefully derived based on invariance, we train an auto-encoder to disentangle the latent representations of such factors given the source and target video clips. This allows us to selectively transfer motion extracted from the source video seamlessly to the target video in spite of structural and view-angle disparities between the source and the target. The relaxed assumption of paired data allows our method to be trained on a vast amount of videos needless of manual annotation of source-target pairing, leading to improved robustness against large structural variations and extreme motion in videos. We demonstrate the effectiveness of our method over the state-of-the-art methods such as NKN [39], EDN [7] and LCM [3]. Code, model and data are publicly available on our project page.¹

Original language	English
Article number	9156437
Pages (from-to)	5305-5314
Number of pages	10
Journal	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs	https://doi.org/10.1109/CVPR42600.2020.00535
Publication status	Published - 2020
Externally published	Yes
Event	2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States Duration: Jun 14 2020 → Jun 19 2020

Bibliographical note

Publisher Copyright:
© 2020 IEEE

ASJC Scopus Subject Areas

Software
Computer Vision and Pattern Recognition

Access to Document

10.1109/CVPR42600.2020.00535

Cite this

@article{923fddce20974064ab79b0d66344a076,

title = "TransMoMO: Invariance-driven unsupervised video motion retargeting",

abstract = "We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person (Fig. 1). Without using any paired data for supervision, the proposed method can be trained in an unsupervised manner by exploiting invariance properties of three orthogonal factors of variation including motion, structure, and view-angle. Specifically, with loss functions carefully derived based on invariance, we train an auto-encoder to disentangle the latent representations of such factors given the source and target video clips. This allows us to selectively transfer motion extracted from the source video seamlessly to the target video in spite of structural and view-angle disparities between the source and the target. The relaxed assumption of paired data allows our method to be trained on a vast amount of videos needless of manual annotation of source-target pairing, leading to improved robustness against large structural variations and extreme motion in videos. We demonstrate the effectiveness of our method over the state-of-the-art methods such as NKN [39], EDN [7] and LCM [3]. Code, model and data are publicly available on our project page.1",

author = "Zhuoqian Yang and Wentao Zhu and Wenyan Wu and Chen Qian and Qiang Zhou and Bolei Zhou and Loy, \{Chen Change\}",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE; 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 ; Conference date: 14-06-2020 Through 19-06-2020",

year = "2020",

doi = "10.1109/CVPR42600.2020.00535",

language = "English",

pages = "5305--5314",

journal = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

issn = "1063-6919",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - TransMoMO

T2 - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020

AU - Yang, Zhuoqian

AU - Zhu, Wentao

AU - Wu, Wenyan

AU - Qian, Chen

AU - Zhou, Qiang

AU - Zhou, Bolei

AU - Loy, Chen Change

PY - 2020

Y1 - 2020

N2 - We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person (Fig. 1). Without using any paired data for supervision, the proposed method can be trained in an unsupervised manner by exploiting invariance properties of three orthogonal factors of variation including motion, structure, and view-angle. Specifically, with loss functions carefully derived based on invariance, we train an auto-encoder to disentangle the latent representations of such factors given the source and target video clips. This allows us to selectively transfer motion extracted from the source video seamlessly to the target video in spite of structural and view-angle disparities between the source and the target. The relaxed assumption of paired data allows our method to be trained on a vast amount of videos needless of manual annotation of source-target pairing, leading to improved robustness against large structural variations and extreme motion in videos. We demonstrate the effectiveness of our method over the state-of-the-art methods such as NKN [39], EDN [7] and LCM [3]. Code, model and data are publicly available on our project page.1

AB - We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person (Fig. 1). Without using any paired data for supervision, the proposed method can be trained in an unsupervised manner by exploiting invariance properties of three orthogonal factors of variation including motion, structure, and view-angle. Specifically, with loss functions carefully derived based on invariance, we train an auto-encoder to disentangle the latent representations of such factors given the source and target video clips. This allows us to selectively transfer motion extracted from the source video seamlessly to the target video in spite of structural and view-angle disparities between the source and the target. The relaxed assumption of paired data allows our method to be trained on a vast amount of videos needless of manual annotation of source-target pairing, leading to improved robustness against large structural variations and extreme motion in videos. We demonstrate the effectiveness of our method over the state-of-the-art methods such as NKN [39], EDN [7] and LCM [3]. Code, model and data are publicly available on our project page.1

UR - http://www.scopus.com/inward/record.url?scp=85094324834&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85094324834&partnerID=8YFLogxK

U2 - 10.1109/CVPR42600.2020.00535

DO - 10.1109/CVPR42600.2020.00535

M3 - Conference article

AN - SCOPUS:85094324834

SN - 1063-6919

SP - 5305

EP - 5314

JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

M1 - 9156437

Y2 - 14 June 2020 through 19 June 2020

ER -

TransMoMO: Invariance-driven unsupervised video motion retargeting

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Access to Document

Other files and links

Cite this