Abstract
In-context segmentation has drawn increasing attention with the advent of vision foundation models. Its goal is to segment objects using given reference images. Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries. This work approaches the problem from a fresh perspective – unlocking the capability of the latent diffusion model (LDM) for in-context segmentation and investigating different design choices. Specifically, we examine the problem from three angles: instruction extraction, output alignment, and meta-architectures. We design a two-stage masking strategy to prevent interfering information from leaking into the instructions. In addition, we propose an augmented pseudo-masking target to ensure the model predicts without forgetting the original images. Moreover, we build a new and fair in-context segmentation benchmark that covers both image and video datasets. Experiments validate the effectiveness of our approach, demonstrating comparable or even stronger results than previous specialist or visual foundation models. We hope our work inspires others to rethink the unification of segmentation and generation.
Original language | English |
---|---|
Title of host publication | Special Track on AI Alignment |
Editors | Toby Walsh, Julie Shah, Zico Kolter |
Publisher | Association for the Advancement of Artificial Intelligence |
Pages | 7545-7553 |
Number of pages | 9 |
Edition | 7 |
ISBN (Electronic) | 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978 |
DOIs | |
Publication status | Published - Apr 11 2025 |
Externally published | Yes |
Event | 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States Duration: Feb 25 2025 → Mar 4 2025 |
Publication series
Name | Proceedings of the AAAI Conference on Artificial Intelligence |
---|---|
Number | 7 |
Volume | 39 |
ISSN (Print) | 2159-5399 |
ISSN (Electronic) | 2374-3468 |
Conference
Conference | 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 |
---|---|
Country/Territory | United States |
City | Philadelphia |
Period | 2/25/25 → 3/4/25 |
Bibliographical note
Publisher Copyright:Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
ASJC Scopus Subject Areas
- Artificial Intelligence