Correlational Image Modeling for Self-Supervised Visual Pre-Training

Wei Li; Jiahao Xie; Chen Change Loy

doi:10.1109/CVPR52729.2023.01450

Correlational Image Modeling for Self-Supervised Visual Pre-Training

Wei Li, Jiahao Xie, Chen Change Loy

Nanyang Technological University

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

14 Citations (Scopus)

Abstract

We introduce Correlational Image Modeling (CIM), a novel and surprisingly effective approach to self-supervised visual pre-training. Our CIM performs a simple pretext task: we randomly crop image regions (exemplars) from an input image (context) and predict correlation maps between the exemplars and the context. Three key designs enable correlational image modeling as a nontrivial and meaningful self-supervisory task. First, to generate useful exemplar-context pairs, we consider cropping image regions with various scales, shapes, rotations, and transformations. Second, we employ a bootstrap learning framework that involves online and target encoders. During pre-training, the former takes exemplars as inputs while the latter converts the context. Third, we model the output correlation maps via a simple cross-attention block, within which the context serves as queries and the exemplars offer values and keys. We show that CIM performs on par or better than the current state of the art on self-supervised and transfer benchmarks. Code is available at https://github.com/weivision/Correlational-Image-Modeling.git.

Original language	English
Title of host publication	Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
Publisher	IEEE Computer Society
Pages	15105-15115
Number of pages	11
ISBN (Electronic)	9798350301298
DOIs	https://doi.org/10.1109/CVPR52729.2023.01450
Publication status	Published - 2023
Externally published	Yes
Event	2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 - Vancouver, Canada Duration: Jun 18 2023 → Jun 22 2023

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume	2023-June
ISSN (Print)	1063-6919

Conference

Conference	2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
Country/Territory	Canada
City	Vancouver
Period	6/18/23 → 6/22/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

ASJC Scopus Subject Areas

Software
Computer Vision and Pattern Recognition

Keywords

Self-supervised or unsupervised representation learning

Access to Document

10.1109/CVPR52729.2023.01450

Cite this

Li, W., Xie, J., & Loy, C. C. (2023). Correlational Image Modeling for Self-Supervised Visual Pre-Training. In Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 (pp. 15105-15115). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 2023-June). IEEE Computer Society. https://doi.org/10.1109/CVPR52729.2023.01450

@inproceedings{29232fc083864e5fa1a425af290a98c7,

title = "Correlational Image Modeling for Self-Supervised Visual Pre-Training",

abstract = "We introduce Correlational Image Modeling (CIM), a novel and surprisingly effective approach to self-supervised visual pre-training. Our CIM performs a simple pretext task: we randomly crop image regions (exemplars) from an input image (context) and predict correlation maps between the exemplars and the context. Three key designs enable correlational image modeling as a nontrivial and meaningful self-supervisory task. First, to generate useful exemplar-context pairs, we consider cropping image regions with various scales, shapes, rotations, and transformations. Second, we employ a bootstrap learning framework that involves online and target encoders. During pre-training, the former takes exemplars as inputs while the latter converts the context. Third, we model the output correlation maps via a simple cross-attention block, within which the context serves as queries and the exemplars offer values and keys. We show that CIM performs on par or better than the current state of the art on self-supervised and transfer benchmarks. Code is available at https://github.com/weivision/Correlational-Image-Modeling.git.",

keywords = "Self-supervised or unsupervised representation learning",

author = "Wei Li and Jiahao Xie and Loy, \{Chen Change\}",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 ; Conference date: 18-06-2023 Through 22-06-2023",

year = "2023",

doi = "10.1109/CVPR52729.2023.01450",

language = "English",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "15105--15115",

booktitle = "Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023",

address = "United States",

}

Li, W, Xie, J & Loy, CC 2023, Correlational Image Modeling for Self-Supervised Visual Pre-Training. in Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2023-June, IEEE Computer Society, pp. 15105-15115, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, Canada, 6/18/23. https://doi.org/10.1109/CVPR52729.2023.01450

Correlational Image Modeling for Self-Supervised Visual Pre-Training. / Li, Wei; Xie, Jiahao; Loy, Chen Change.
Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023. IEEE Computer Society, 2023. p. 15105-15115 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 2023-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Correlational Image Modeling for Self-Supervised Visual Pre-Training

AU - Li, Wei

AU - Xie, Jiahao

AU - Loy, Chen Change

PY - 2023

Y1 - 2023

N2 - We introduce Correlational Image Modeling (CIM), a novel and surprisingly effective approach to self-supervised visual pre-training. Our CIM performs a simple pretext task: we randomly crop image regions (exemplars) from an input image (context) and predict correlation maps between the exemplars and the context. Three key designs enable correlational image modeling as a nontrivial and meaningful self-supervisory task. First, to generate useful exemplar-context pairs, we consider cropping image regions with various scales, shapes, rotations, and transformations. Second, we employ a bootstrap learning framework that involves online and target encoders. During pre-training, the former takes exemplars as inputs while the latter converts the context. Third, we model the output correlation maps via a simple cross-attention block, within which the context serves as queries and the exemplars offer values and keys. We show that CIM performs on par or better than the current state of the art on self-supervised and transfer benchmarks. Code is available at https://github.com/weivision/Correlational-Image-Modeling.git.

AB - We introduce Correlational Image Modeling (CIM), a novel and surprisingly effective approach to self-supervised visual pre-training. Our CIM performs a simple pretext task: we randomly crop image regions (exemplars) from an input image (context) and predict correlation maps between the exemplars and the context. Three key designs enable correlational image modeling as a nontrivial and meaningful self-supervisory task. First, to generate useful exemplar-context pairs, we consider cropping image regions with various scales, shapes, rotations, and transformations. Second, we employ a bootstrap learning framework that involves online and target encoders. During pre-training, the former takes exemplars as inputs while the latter converts the context. Third, we model the output correlation maps via a simple cross-attention block, within which the context serves as queries and the exemplars offer values and keys. We show that CIM performs on par or better than the current state of the art on self-supervised and transfer benchmarks. Code is available at https://github.com/weivision/Correlational-Image-Modeling.git.

KW - Self-supervised or unsupervised representation learning

UR - http://www.scopus.com/inward/record.url?scp=85173974401&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85173974401&partnerID=8YFLogxK

U2 - 10.1109/CVPR52729.2023.01450

DO - 10.1109/CVPR52729.2023.01450

M3 - Conference contribution

AN - SCOPUS:85173974401

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 15105

EP - 15115

BT - Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023

PB - IEEE Computer Society

T2 - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023

Y2 - 18 June 2023 through 22 June 2023

ER -

Correlational Image Modeling for Self-Supervised Visual Pre-Training

Abstract

Publication series

Conference

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Cite this