Deep Animation Video Interpolation in the Wild

Li Siyao; Shiyu Zhao; Weijiang Yu; Wenxiu Sun; Dimitris Metaxas; Chen Change Loy; Ziwei Liu

doi:10.1109/CVPR46437.2021.00652

Deep Animation Video Interpolation in the Wild

Li Siyao, Shiyu Zhao, Weijiang Yu, Wenxiu Sun, Dimitris Metaxas, Chen Change Loy, Ziwei Liu^*

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

89 Citations (Scopus)

Abstract

In the animation industry, cartoon videos are usually produced at low frame rate since hand drawing of such frames is costly and time-consuming. Therefore, it is desirable to develop computational models that can automatically interpolate the in-between animation frames. However, existing video interpolation methods fail to produce satisfying results on animation data. Compared to natural videos, animation videos possess two unique characteristics that make frame interpolation difficult: 1) cartoons comprise lines and smooth color pieces. The smooth areas lack textures and make it difficult to estimate accurate motions on animation videos. 2) cartoons express stories via exaggeration. Some of the motions are non-linear and extremely large. In this work, we formally define and study the animation video interpolation problem for the first time. To address the aforementioned challenges, we propose an effective framework, AnimeInterp, with two dedicated modules in a coarse-to-fine manner. Specifically, 1) Segment-Guided Matching resolves the “lack of textures” challenge by exploiting global matching among color pieces that are piecewise coherent. 2) Recurrent Flow Refinement resolves the “non-linear and extremely large motion” challenge by recurrent predictions using a transformer-like architecture. To facilitate comprehensive training and evaluations, we build a large-scale animation triplet dataset, ATD-12K, which comprises 12,000 triplets with rich annotations. Extensive experiments demonstrate that our approach outperforms existing state-of-the-art interpolation methods for animation videos. Notably, AnimeInterp shows favorable perceptual quality and robustness for animation scenarios in the wild. The proposed dataset and code are available at https://github.com/lisiyao21/AnimeInterp/.

Original language	English
Title of host publication	Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
Publisher	IEEE Computer Society
Pages	6583-6591
Number of pages	9
ISBN (Electronic)	9781665445092
DOIs	https://doi.org/10.1109/CVPR46437.2021.00652
Publication status	Published - 2021
Externally published	Yes
Event	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 - Virtual, Online, United States Duration: Jun 19 2021 → Jun 25 2021

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)	1063-6919

Conference

Conference	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
Country/Territory	United States
City	Virtual, Online
Period	6/19/21 → 6/25/21

Bibliographical note

Publisher Copyright:
© 2021 IEEE

ASJC Scopus Subject Areas

Software
Computer Vision and Pattern Recognition

Access to Document

10.1109/CVPR46437.2021.00652

Cite this

Siyao, L., Zhao, S., Yu, W., Sun, W., Metaxas, D., Loy, C. C., & Liu, Z. (2021). Deep Animation Video Interpolation in the Wild. In Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 (pp. 6583-6591). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. https://doi.org/10.1109/CVPR46437.2021.00652

@inproceedings{c00a3978367c49f4a6d92d198ff0f0c5,

title = "Deep Animation Video Interpolation in the Wild",

abstract = "In the animation industry, cartoon videos are usually produced at low frame rate since hand drawing of such frames is costly and time-consuming. Therefore, it is desirable to develop computational models that can automatically interpolate the in-between animation frames. However, existing video interpolation methods fail to produce satisfying results on animation data. Compared to natural videos, animation videos possess two unique characteristics that make frame interpolation difficult: 1) cartoons comprise lines and smooth color pieces. The smooth areas lack textures and make it difficult to estimate accurate motions on animation videos. 2) cartoons express stories via exaggeration. Some of the motions are non-linear and extremely large. In this work, we formally define and study the animation video interpolation problem for the first time. To address the aforementioned challenges, we propose an effective framework, AnimeInterp, with two dedicated modules in a coarse-to-fine manner. Specifically, 1) Segment-Guided Matching resolves the “lack of textures” challenge by exploiting global matching among color pieces that are piecewise coherent. 2) Recurrent Flow Refinement resolves the “non-linear and extremely large motion” challenge by recurrent predictions using a transformer-like architecture. To facilitate comprehensive training and evaluations, we build a large-scale animation triplet dataset, ATD-12K, which comprises 12,000 triplets with rich annotations. Extensive experiments demonstrate that our approach outperforms existing state-of-the-art interpolation methods for animation videos. Notably, AnimeInterp shows favorable perceptual quality and robustness for animation scenarios in the wild. The proposed dataset and code are available at https://github.com/lisiyao21/AnimeInterp/.",

author = "Li Siyao and Shiyu Zhao and Weijiang Yu and Wenxiu Sun and Dimitris Metaxas and Loy, \{Chen Change\} and Ziwei Liu",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE; 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 ; Conference date: 19-06-2021 Through 25-06-2021",

year = "2021",

doi = "10.1109/CVPR46437.2021.00652",

language = "English",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "6583--6591",

booktitle = "Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021",

address = "United States",

}

Siyao, L, Zhao, S, Yu, W, Sun, W, Metaxas, D, Loy, CC & Liu, Z 2021, Deep Animation Video Interpolation in the Wild. in Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, pp. 6583-6591, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, Online, United States, 6/19/21. https://doi.org/10.1109/CVPR46437.2021.00652

Deep Animation Video Interpolation in the Wild. / Siyao, Li; Zhao, Shiyu; Yu, Weijiang et al.
Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021. IEEE Computer Society, 2021. p. 6583-6591 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Deep Animation Video Interpolation in the Wild

AU - Siyao, Li

AU - Zhao, Shiyu

AU - Yu, Weijiang

AU - Sun, Wenxiu

AU - Metaxas, Dimitris

AU - Loy, Chen Change

AU - Liu, Ziwei

PY - 2021

Y1 - 2021

N2 - In the animation industry, cartoon videos are usually produced at low frame rate since hand drawing of such frames is costly and time-consuming. Therefore, it is desirable to develop computational models that can automatically interpolate the in-between animation frames. However, existing video interpolation methods fail to produce satisfying results on animation data. Compared to natural videos, animation videos possess two unique characteristics that make frame interpolation difficult: 1) cartoons comprise lines and smooth color pieces. The smooth areas lack textures and make it difficult to estimate accurate motions on animation videos. 2) cartoons express stories via exaggeration. Some of the motions are non-linear and extremely large. In this work, we formally define and study the animation video interpolation problem for the first time. To address the aforementioned challenges, we propose an effective framework, AnimeInterp, with two dedicated modules in a coarse-to-fine manner. Specifically, 1) Segment-Guided Matching resolves the “lack of textures” challenge by exploiting global matching among color pieces that are piecewise coherent. 2) Recurrent Flow Refinement resolves the “non-linear and extremely large motion” challenge by recurrent predictions using a transformer-like architecture. To facilitate comprehensive training and evaluations, we build a large-scale animation triplet dataset, ATD-12K, which comprises 12,000 triplets with rich annotations. Extensive experiments demonstrate that our approach outperforms existing state-of-the-art interpolation methods for animation videos. Notably, AnimeInterp shows favorable perceptual quality and robustness for animation scenarios in the wild. The proposed dataset and code are available at https://github.com/lisiyao21/AnimeInterp/.

AB - In the animation industry, cartoon videos are usually produced at low frame rate since hand drawing of such frames is costly and time-consuming. Therefore, it is desirable to develop computational models that can automatically interpolate the in-between animation frames. However, existing video interpolation methods fail to produce satisfying results on animation data. Compared to natural videos, animation videos possess two unique characteristics that make frame interpolation difficult: 1) cartoons comprise lines and smooth color pieces. The smooth areas lack textures and make it difficult to estimate accurate motions on animation videos. 2) cartoons express stories via exaggeration. Some of the motions are non-linear and extremely large. In this work, we formally define and study the animation video interpolation problem for the first time. To address the aforementioned challenges, we propose an effective framework, AnimeInterp, with two dedicated modules in a coarse-to-fine manner. Specifically, 1) Segment-Guided Matching resolves the “lack of textures” challenge by exploiting global matching among color pieces that are piecewise coherent. 2) Recurrent Flow Refinement resolves the “non-linear and extremely large motion” challenge by recurrent predictions using a transformer-like architecture. To facilitate comprehensive training and evaluations, we build a large-scale animation triplet dataset, ATD-12K, which comprises 12,000 triplets with rich annotations. Extensive experiments demonstrate that our approach outperforms existing state-of-the-art interpolation methods for animation videos. Notably, AnimeInterp shows favorable perceptual quality and robustness for animation scenarios in the wild. The proposed dataset and code are available at https://github.com/lisiyao21/AnimeInterp/.

UR - http://www.scopus.com/inward/record.url?scp=85117580335&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85117580335&partnerID=8YFLogxK

U2 - 10.1109/CVPR46437.2021.00652

DO - 10.1109/CVPR46437.2021.00652

M3 - Conference contribution

AN - SCOPUS:85117580335

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 6583

EP - 6591

BT - Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021

PB - IEEE Computer Society

T2 - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021

Y2 - 19 June 2021 through 25 June 2021

ER -

Siyao L, Zhao S, Yu W, Sun W, Metaxas D, Loy CC et al. Deep Animation Video Interpolation in the Wild. In Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021. IEEE Computer Society. 2021. p. 6583-6591. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR46437.2021.00652

Deep Animation Video Interpolation in the Wild

Abstract

Publication series

Conference

Bibliographical note

ASJC Scopus Subject Areas

Access to Document

Other files and links

Cite this