Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup

Guodong Xu; Ziwei Liu; Chen Change Loy

doi:10.1016/j.patcog.2023.109338

Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup

Guodong Xu^*, Ziwei Liu, Chen Change Loy

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

26 Citations (Scopus)

Abstract

Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD.

Original language	English
Article number	109338
Journal	Pattern Recognition
Volume	138
DOIs	https://doi.org/10.1016/j.patcog.2023.109338
Publication status	Published - Jun 2023
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2023

ASJC Scopus Subject Areas

Software
Signal Processing
Computer Vision and Pattern Recognition
Artificial Intelligence

Keywords

Knowledge distillation
Training cost

Access to Document

10.1016/j.patcog.2023.109338

Cite this

@article{fb1994bcfa4d470c95f73dde781d8de0,

title = "Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup",

abstract = "Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20\% to 30\% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD.",

keywords = "Knowledge distillation, Training cost",

author = "Guodong Xu and Ziwei Liu and Loy, \{Chen Change\}",

note = "Publisher Copyright: {\textcopyright} 2023",

year = "2023",

month = jun,

doi = "10.1016/j.patcog.2023.109338",

language = "English",

volume = "138",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup

AU - Xu, Guodong

AU - Liu, Ziwei

AU - Loy, Chen Change

PY - 2023/6

Y1 - 2023/6

N2 - Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD.

AB - Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD.

KW - Knowledge distillation

KW - Training cost

UR - http://www.scopus.com/inward/record.url?scp=85147248505&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85147248505&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2023.109338

DO - 10.1016/j.patcog.2023.109338

M3 - Article

AN - SCOPUS:85147248505

SN - 0031-3203

VL - 138

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 109338

ER -

Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Cite this