Panoptic Video Scene Graph Generation

Jingkang Yang; Wenxuan Peng; Xiangtai Li; Zujin Guo; Liangyu Chen; Bo Li; Zheng Ma; Kaiyang Zhou; Wayne Zhang; Chen Change Loy; Ziwei Liu

doi:10.1109/CVPR52729.2023.01791

Panoptic Video Scene Graph Generation

Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu^*

^*Corresponding author for this work

Research output: Contribution to journal › Conference article › peer-review

31 Citations (Scopus)

Abstract

Towards building comprehensive real-world visual perception systems, we propose and study a new problem called panoptic scene graph generation (PVSG). PVSG is related to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects localized with bounding boxes in videos. However, the limitation of bounding boxes in detecting non-rigid objects and backgrounds often causes VidSGG systems to miss key details that are crucial for comprehensive video understanding. In contrast, PVSG requires nodes in scene graphs to be grounded by more precise, pixel-level segmentation masks, which facilitate holistic scene understanding. To advance research in this new area, we contribute a high-quality PVSG dataset, which consists of 400 videos (289 third-person + 111 egocentric videos) with totally 150K frames labeled with panoptic segmentation masks as well as fine, temporal scene graphs. We also provide a variety of baseline methods and share useful design practices for future work.

Original language	English
Pages (from-to)	18675-18685
Number of pages	11
Journal	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume	2023-June
DOIs	https://doi.org/10.1109/CVPR52729.2023.01791
Publication status	Published - 2023
Externally published	Yes
Event	2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 - Vancouver, Canada Duration: Jun 18 2023 → Jun 22 2023

Bibliographical note

Publisher Copyright:
©2023 IEEE.

ASJC Scopus Subject Areas

Software
Computer Vision and Pattern Recognition

Keywords

Scene analysis and understanding

Access to Document

10.1109/CVPR52729.2023.01791

Cite this

@article{5336bee574cc41059529e74ff32cb5dd,

title = "Panoptic Video Scene Graph Generation",

abstract = "Towards building comprehensive real-world visual perception systems, we propose and study a new problem called panoptic scene graph generation (PVSG). PVSG is related to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects localized with bounding boxes in videos. However, the limitation of bounding boxes in detecting non-rigid objects and backgrounds often causes VidSGG systems to miss key details that are crucial for comprehensive video understanding. In contrast, PVSG requires nodes in scene graphs to be grounded by more precise, pixel-level segmentation masks, which facilitate holistic scene understanding. To advance research in this new area, we contribute a high-quality PVSG dataset, which consists of 400 videos (289 third-person + 111 egocentric videos) with totally 150K frames labeled with panoptic segmentation masks as well as fine, temporal scene graphs. We also provide a variety of baseline methods and share useful design practices for future work.",

keywords = "Scene analysis and understanding",

author = "Jingkang Yang and Wenxuan Peng and Xiangtai Li and Zujin Guo and Liangyu Chen and Bo Li and Zheng Ma and Kaiyang Zhou and Wayne Zhang and Loy, \{Chen Change\} and Ziwei Liu",

note = "Publisher Copyright: {\textcopyright}2023 IEEE.; 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 ; Conference date: 18-06-2023 Through 22-06-2023",

year = "2023",

doi = "10.1109/CVPR52729.2023.01791",

language = "English",

volume = "2023-June",

pages = "18675--18685",

journal = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

issn = "1063-6919",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Panoptic Video Scene Graph Generation

AU - Yang, Jingkang

AU - Peng, Wenxuan

AU - Li, Xiangtai

AU - Guo, Zujin

AU - Chen, Liangyu

AU - Li, Bo

AU - Ma, Zheng

AU - Zhou, Kaiyang

AU - Zhang, Wayne

AU - Loy, Chen Change

AU - Liu, Ziwei

PY - 2023

Y1 - 2023

N2 - Towards building comprehensive real-world visual perception systems, we propose and study a new problem called panoptic scene graph generation (PVSG). PVSG is related to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects localized with bounding boxes in videos. However, the limitation of bounding boxes in detecting non-rigid objects and backgrounds often causes VidSGG systems to miss key details that are crucial for comprehensive video understanding. In contrast, PVSG requires nodes in scene graphs to be grounded by more precise, pixel-level segmentation masks, which facilitate holistic scene understanding. To advance research in this new area, we contribute a high-quality PVSG dataset, which consists of 400 videos (289 third-person + 111 egocentric videos) with totally 150K frames labeled with panoptic segmentation masks as well as fine, temporal scene graphs. We also provide a variety of baseline methods and share useful design practices for future work.

AB - Towards building comprehensive real-world visual perception systems, we propose and study a new problem called panoptic scene graph generation (PVSG). PVSG is related to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects localized with bounding boxes in videos. However, the limitation of bounding boxes in detecting non-rigid objects and backgrounds often causes VidSGG systems to miss key details that are crucial for comprehensive video understanding. In contrast, PVSG requires nodes in scene graphs to be grounded by more precise, pixel-level segmentation masks, which facilitate holistic scene understanding. To advance research in this new area, we contribute a high-quality PVSG dataset, which consists of 400 videos (289 third-person + 111 egocentric videos) with totally 150K frames labeled with panoptic segmentation masks as well as fine, temporal scene graphs. We also provide a variety of baseline methods and share useful design practices for future work.

KW - Scene analysis and understanding

UR - http://www.scopus.com/inward/record.url?scp=85177617359&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85177617359&partnerID=8YFLogxK

U2 - 10.1109/CVPR52729.2023.01791

DO - 10.1109/CVPR52729.2023.01791

M3 - Conference article

AN - SCOPUS:85177617359

SN - 1063-6919

VL - 2023-June

SP - 18675

EP - 18685

JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

T2 - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023

Y2 - 18 June 2023 through 22 June 2023

ER -

Panoptic Video Scene Graph Generation

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Cite this