Dense Siamese Network for Dense Unsupervised Learning

Wenwei Zhang; Jiangmiao Pang; Kai Chen; Chen Change Loy

doi:10.1007/978-3-031-20056-4_27

Dense Siamese Network for Dense Unsupervised Learning

Wenwei Zhang, Jiangmiao Pang, Kai Chen, Chen Change Loy^*

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

12 Citations (Scopus)

Abstract

This paper presents Dense Siamese Network (DenseSiam), a simple unsupervised learning framework for dense prediction tasks. It learns visual representations by maximizing the similarity between two views of one image with two types of consistency, i.e., pixel consistency and region consistency. Concretely, DenseSiam first maximizes the pixel level spatial consistency according to the exact location correspondence in the overlapped area. It also extracts a batch of region embeddings that correspond to some sub-regions in the overlapped area to be contrasted for region consistency. In contrast to previous methods that require negative pixel pairs, momentum encoders or heuristic masks, DenseSiam benefits from the simple Siamese network and optimizes the consistency of different granularities. It also proves that the simple location correspondence and interacted region embeddings are effective enough to learn the similarity. We apply DenseSiam on ImageNet and obtain competitive improvements on various downstream tasks. We also show that only with some extra task-specific losses, the simple framework can directly conduct dense prediction tasks. On an existing unsupervised semantic segmentation benchmark, it surpasses state-of-the-art segmentation methods by 2.1 mIoU with 28% training costs. Code and models are released at https://github.com/ZwwWayne/DenseSiam.

Original language	English
Title of host publication	Computer Vision – ECCV 2022 - 17th European Conference, Proceedings
Editors	Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	464-480
Number of pages	17
ISBN (Print)	9783031200557
DOIs	https://doi.org/10.1007/978-3-031-20056-4_27
Publication status	Published - 2022
Externally published	Yes
Event	17th European Conference on Computer Vision, ECCV 2022 - Tel Aviv, Israel Duration: Oct 23 2022 → Oct 27 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13690 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	17th European Conference on Computer Vision, ECCV 2022
Country/Territory	Israel
City	Tel Aviv
Period	10/23/22 → 10/27/22

Bibliographical note

Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

ASJC Scopus Subject Areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-031-20056-4_27

Cite this

Zhang, W., Pang, J., Chen, K., & Loy, C. C. (2022). Dense Siamese Network for Dense Unsupervised Learning. In S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, & T. Hassner (Eds.), Computer Vision – ECCV 2022 - 17th European Conference, Proceedings (pp. 464-480). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13690 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20056-4_27

Zhang, Wenwei ; Pang, Jiangmiao ; Chen, Kai et al. / Dense Siamese Network for Dense Unsupervised Learning. Computer Vision – ECCV 2022 - 17th European Conference, Proceedings. editor / Shai Avidan ; Gabriel Brostow ; Moustapha Cissé ; Giovanni Maria Farinella ; Tal Hassner. Springer Science and Business Media Deutschland GmbH, 2022. pp. 464-480 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{f7579b09658b45af8f5b78a2435df983,

title = "Dense Siamese Network for Dense Unsupervised Learning",

abstract = "This paper presents Dense Siamese Network (DenseSiam), a simple unsupervised learning framework for dense prediction tasks. It learns visual representations by maximizing the similarity between two views of one image with two types of consistency, i.e., pixel consistency and region consistency. Concretely, DenseSiam first maximizes the pixel level spatial consistency according to the exact location correspondence in the overlapped area. It also extracts a batch of region embeddings that correspond to some sub-regions in the overlapped area to be contrasted for region consistency. In contrast to previous methods that require negative pixel pairs, momentum encoders or heuristic masks, DenseSiam benefits from the simple Siamese network and optimizes the consistency of different granularities. It also proves that the simple location correspondence and interacted region embeddings are effective enough to learn the similarity. We apply DenseSiam on ImageNet and obtain competitive improvements on various downstream tasks. We also show that only with some extra task-specific losses, the simple framework can directly conduct dense prediction tasks. On an existing unsupervised semantic segmentation benchmark, it surpasses state-of-the-art segmentation methods by 2.1 mIoU with 28\% training costs. Code and models are released at https://github.com/ZwwWayne/DenseSiam.",

author = "Wenwei Zhang and Jiangmiao Pang and Kai Chen and Loy, \{Chen Change\}",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 17th European Conference on Computer Vision, ECCV 2022 ; Conference date: 23-10-2022 Through 27-10-2022",

year = "2022",

doi = "10.1007/978-3-031-20056-4\_27",

language = "English",

isbn = "9783031200557",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "464--480",

editor = "Shai Avidan and Gabriel Brostow and Moustapha Ciss{\'e} and Farinella, \{Giovanni Maria\} and Tal Hassner",

booktitle = "Computer Vision – ECCV 2022 - 17th European Conference, Proceedings",

address = "Germany",

}

Zhang, W, Pang, J, Chen, K & Loy, CC 2022, Dense Siamese Network for Dense Unsupervised Learning. in S Avidan, G Brostow, M Cissé, GM Farinella & T Hassner (eds), Computer Vision – ECCV 2022 - 17th European Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13690 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 464-480, 17th European Conference on Computer Vision, ECCV 2022, Tel Aviv, Israel, 10/23/22. https://doi.org/10.1007/978-3-031-20056-4_27

Dense Siamese Network for Dense Unsupervised Learning. / Zhang, Wenwei; Pang, Jiangmiao; Chen, Kai et al.
Computer Vision – ECCV 2022 - 17th European Conference, Proceedings. ed. / Shai Avidan; Gabriel Brostow; Moustapha Cissé; Giovanni Maria Farinella; Tal Hassner. Springer Science and Business Media Deutschland GmbH, 2022. p. 464-480 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13690 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Dense Siamese Network for Dense Unsupervised Learning

AU - Zhang, Wenwei

AU - Pang, Jiangmiao

AU - Chen, Kai

AU - Loy, Chen Change

PY - 2022

Y1 - 2022

N2 - This paper presents Dense Siamese Network (DenseSiam), a simple unsupervised learning framework for dense prediction tasks. It learns visual representations by maximizing the similarity between two views of one image with two types of consistency, i.e., pixel consistency and region consistency. Concretely, DenseSiam first maximizes the pixel level spatial consistency according to the exact location correspondence in the overlapped area. It also extracts a batch of region embeddings that correspond to some sub-regions in the overlapped area to be contrasted for region consistency. In contrast to previous methods that require negative pixel pairs, momentum encoders or heuristic masks, DenseSiam benefits from the simple Siamese network and optimizes the consistency of different granularities. It also proves that the simple location correspondence and interacted region embeddings are effective enough to learn the similarity. We apply DenseSiam on ImageNet and obtain competitive improvements on various downstream tasks. We also show that only with some extra task-specific losses, the simple framework can directly conduct dense prediction tasks. On an existing unsupervised semantic segmentation benchmark, it surpasses state-of-the-art segmentation methods by 2.1 mIoU with 28% training costs. Code and models are released at https://github.com/ZwwWayne/DenseSiam.

AB - This paper presents Dense Siamese Network (DenseSiam), a simple unsupervised learning framework for dense prediction tasks. It learns visual representations by maximizing the similarity between two views of one image with two types of consistency, i.e., pixel consistency and region consistency. Concretely, DenseSiam first maximizes the pixel level spatial consistency according to the exact location correspondence in the overlapped area. It also extracts a batch of region embeddings that correspond to some sub-regions in the overlapped area to be contrasted for region consistency. In contrast to previous methods that require negative pixel pairs, momentum encoders or heuristic masks, DenseSiam benefits from the simple Siamese network and optimizes the consistency of different granularities. It also proves that the simple location correspondence and interacted region embeddings are effective enough to learn the similarity. We apply DenseSiam on ImageNet and obtain competitive improvements on various downstream tasks. We also show that only with some extra task-specific losses, the simple framework can directly conduct dense prediction tasks. On an existing unsupervised semantic segmentation benchmark, it surpasses state-of-the-art segmentation methods by 2.1 mIoU with 28% training costs. Code and models are released at https://github.com/ZwwWayne/DenseSiam.

UR - http://www.scopus.com/inward/record.url?scp=85144569361&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85144569361&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-20056-4_27

DO - 10.1007/978-3-031-20056-4_27

M3 - Conference contribution

AN - SCOPUS:85144569361

SN - 9783031200557

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 464

EP - 480

BT - Computer Vision – ECCV 2022 - 17th European Conference, Proceedings

A2 - Avidan, Shai

A2 - Brostow, Gabriel

A2 - Cissé, Moustapha

A2 - Farinella, Giovanni Maria

A2 - Hassner, Tal

PB - Springer Science and Business Media Deutschland GmbH

T2 - 17th European Conference on Computer Vision, ECCV 2022

Y2 - 23 October 2022 through 27 October 2022

ER -

Zhang W, Pang J, Chen K, Loy CC. Dense Siamese Network for Dense Unsupervised Learning. In Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors, Computer Vision – ECCV 2022 - 17th European Conference, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. p. 464-480. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-20056-4_27