Generalizable Implicit Motion Modeling for Video Frame Interpolation

Zujin Guo; Wei Li; Chen Change Loy

Generalizable Implicit Motion Modeling for Video Frame Interpolation

Zujin Guo, Wei Li, Chen Change Loy

Nanyang Technological University

Research output: Contribution to journal › Conference article › peer-review

2 Citations (Scopus)

Abstract

Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion. We show that GIMM performs better than the current state of the art on standard VFI benchmarks.

Original language	English
Journal	Advances in Neural Information Processing Systems
Volume	37
Publication status	Published - 2024
Externally published	Yes
Event	38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada Duration: Dec 9 2024 → Dec 15 2024

Bibliographical note

Publisher Copyright:
© 2024 Neural information processing systems foundation. All rights reserved.

ASJC Scopus Subject Areas

Computer Networks and Communications
Information Systems
Signal Processing

Cite this

@article{f4584139995649d8b18b2bf03b3d6d33,

title = "Generalizable Implicit Motion Modeling for Video Frame Interpolation",

abstract = "Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion. We show that GIMM performs better than the current state of the art on standard VFI benchmarks.",

author = "Zujin Guo and Wei Li and Loy, \{Chen Change\}",

note = "Publisher Copyright: {\textcopyright} 2024 Neural information processing systems foundation. All rights reserved.; 38th Conference on Neural Information Processing Systems, NeurIPS 2024 ; Conference date: 09-12-2024 Through 15-12-2024",

year = "2024",

language = "English",

volume = "37",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

}

TY - JOUR

T1 - Generalizable Implicit Motion Modeling for Video Frame Interpolation

AU - Guo, Zujin

AU - Li, Wei

AU - Loy, Chen Change

PY - 2024

Y1 - 2024

N2 - Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion. We show that GIMM performs better than the current state of the art on standard VFI benchmarks.

AB - Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion. We show that GIMM performs better than the current state of the art on standard VFI benchmarks.

UR - http://www.scopus.com/inward/record.url?scp=105000533953&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=105000533953&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:105000533953

SN - 1049-5258

VL - 37

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024

Y2 - 9 December 2024 through 15 December 2024

ER -

Generalizable Implicit Motion Modeling for Video Frame Interpolation

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Other files and links

Cite this