OMG-Seg: Is One Model Good Enough for all Segmentation?

Xiangtai Li; Haobo Yuan; Wei Li; Henghui Ding; Size Wu; Wenwei Zhang; Yining Li; Kai Chen; Chen Change Loy

doi:10.1109/CVPR52733.2024.02640

OMG-Seg: Is One Model Good Enough for all Segmentation?

Xiangtai Li^*, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy

^*Corresponding author for this work

Research output: Contribution to journal › Conference article › peer-review

48 Citations (Scopus)

Abstract

In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation. To our knowledge, this is the first model to handle all these tasks in one model and achieve satisfactory performance. We show that OMG-Seg, a transformer-based encoder-decoder architecture with task-specific queries and outputs, can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead across various tasks and datasets. We rigorously evaluate the inter-task influences and correlations during co-training.

Original language	English
Pages (from-to)	27948-27959
Number of pages	12
Journal	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs	https://doi.org/10.1109/CVPR52733.2024.02640
Publication status	Published - 2024
Externally published	Yes
Event	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States Duration: Jun 16 2024 → Jun 22 2024

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

ASJC Scopus Subject Areas

Software
Computer Vision and Pattern Recognition

Access to Document

10.1109/CVPR52733.2024.02640

Cite this

@article{9ce891d77c2048a792f801e8a19140a5,

title = "OMG-Seg: Is One Model Good Enough for all Segmentation?",

abstract = "In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation. To our knowledge, this is the first model to handle all these tasks in one model and achieve satisfactory performance. We show that OMG-Seg, a transformer-based encoder-decoder architecture with task-specific queries and outputs, can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead across various tasks and datasets. We rigorously evaluate the inter-task influences and correlations during co-training.",

author = "Xiangtai Li and Haobo Yuan and Wei Li and Henghui Ding and Size Wu and Wenwei Zhang and Yining Li and Kai Chen and Loy, \{Chen Change\}",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 ; Conference date: 16-06-2024 Through 22-06-2024",

year = "2024",

doi = "10.1109/CVPR52733.2024.02640",

language = "English",

pages = "27948--27959",

journal = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

issn = "1063-6919",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - OMG-Seg

T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

AU - Li, Xiangtai

AU - Yuan, Haobo

AU - Li, Wei

AU - Ding, Henghui

AU - Wu, Size

AU - Zhang, Wenwei

AU - Li, Yining

AU - Chen, Kai

AU - Loy, Chen Change

PY - 2024

Y1 - 2024

N2 - In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation. To our knowledge, this is the first model to handle all these tasks in one model and achieve satisfactory performance. We show that OMG-Seg, a transformer-based encoder-decoder architecture with task-specific queries and outputs, can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead across various tasks and datasets. We rigorously evaluate the inter-task influences and correlations during co-training.

AB - In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation. To our knowledge, this is the first model to handle all these tasks in one model and achieve satisfactory performance. We show that OMG-Seg, a transformer-based encoder-decoder architecture with task-specific queries and outputs, can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead across various tasks and datasets. We rigorously evaluate the inter-task influences and correlations during co-training.

UR - http://www.scopus.com/inward/record.url?scp=85218212827&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85218212827&partnerID=8YFLogxK

U2 - 10.1109/CVPR52733.2024.02640

DO - 10.1109/CVPR52733.2024.02640

M3 - Conference article

AN - SCOPUS:85218212827

SN - 1063-6919

SP - 27948

EP - 27959

JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Y2 - 16 June 2024 through 22 June 2024

ER -

OMG-Seg: Is One Model Good Enough for all Segmentation?

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Access to Document

Other files and links

Cite this