Abstract
StyleGAN has excelled in 2D face reconstruction and semantic editing, but the extension to 3D lacks a generic inversion framework, limiting its applications in 3D reconstruction. In this paper, we address the challenge of 3D GAN inversion, focusing on predicting a latent code from a single 2D image to faithfully recover 3D shapes and textures. The inherent ill-posed nature of the problem, coupled with the limited capacity of global latent codes, presents significant challenges. To overcome these challenges, we introduce an efficient self-training scheme that does not rely on real-world 2D-3D pairs but instead utilizes proxy samples generated from a 3D GAN. Additionally, our approach goes beyond the global latent code by enhancing the generation network with a local branch. This branch incorporates pixel-aligned features to accurately reconstruct texture details. Furthermore, we introduce a novel pipeline for 3D view-consistent editing. The efficacy of our method is validated on two representative 3D GANs, namely StyleSDF and EG3D. Through extensive experiments, we demonstrate that our approach consistently outperforms state-of-the-art inversion methods, delivering superior quality in both shape and texture reconstruction.
Original language | English |
---|---|
Journal | International Journal of Computer Vision |
DOIs | |
Publication status | Accepted/In press - 2025 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
ASJC Scopus Subject Areas
- Software
- Computer Vision and Pattern Recognition
- Artificial Intelligence