多任务学习框架下的声事件定位与检测损失函数设计

Jinbo Hu; Yin Cao; Ming Wu; Feiran Yang; Jun Yang

doi:10.12395/0371-0025.2024361

多任务学习框架下的声事件定位与检测损失函数设计

Translated title of the contribution: Loss function design for sound event localization and detection based on multi-task learning

Jinbo Hu, Yin Cao, Ming Wu, Feiran Yang, Jun Yang^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

The track-wise multi-task learning approach exhibits significant efficacy in detecting overlapping sound sources for sound event localization and detection. However, as the number of predicted event classes increases, the track-wise multi-task networks often produce sparse outputs, resulting in missing alarms of sound events. To address this issue, this paper introduces an aggregated loss function, reformulating the multi-task learning framework into a single-task learning problem by coupling the activity of sound events with its Cartesian direction-of-arrival vector. Furthermore, considering the characteristics of the track-wise output format, auxiliary duplicated targets are introduced to optimize the system outputs by replicating events from active tracks into inactive ones. Experimental results on a large-scale synthetic test set with 170 event classes demonstrate that the proposed method significantly improves the performance in sound event detection, effectively reduces the missing alarm rate, and achieves substantial improvement in localization and trajectory tracking. Additionally, experimental results on the real-scene dataset demonstrate the effectiveness of the proposed methods.

Translated title of the contribution	Loss function design for sound event localization and detection based on multi-task learning
Original language	Chinese (Simplified)
Pages (from-to)	338-345
Number of pages	8
Journal	Shengxue Xuebao/Acta Acustica
Volume	50
Issue number	2
DOIs	https://doi.org/10.12395/0371-0025.2024361
Publication status	Published - Mar 2025
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2025 Science Press. All rights reserved.

ASJC Scopus Subject Areas

Acoustics and Ultrasonics

Keywords

Aggregated loss
Auxiliary duplicated target
Event-independent network
Multi-task learning
Sound event localization and detection

Access to Document

10.12395/0371-0025.2024361

Cite this

@article{99ab1e712dc34c4888622e49cd03f000,

title = "多任务学习框架下的声事件定位与检测损失函数设计",

abstract = "The track-wise multi-task learning approach exhibits significant efficacy in detecting overlapping sound sources for sound event localization and detection. However, as the number of predicted event classes increases, the track-wise multi-task networks often produce sparse outputs, resulting in missing alarms of sound events. To address this issue, this paper introduces an aggregated loss function, reformulating the multi-task learning framework into a single-task learning problem by coupling the activity of sound events with its Cartesian direction-of-arrival vector. Furthermore, considering the characteristics of the track-wise output format, auxiliary duplicated targets are introduced to optimize the system outputs by replicating events from active tracks into inactive ones. Experimental results on a large-scale synthetic test set with 170 event classes demonstrate that the proposed method significantly improves the performance in sound event detection, effectively reduces the missing alarm rate, and achieves substantial improvement in localization and trajectory tracking. Additionally, experimental results on the real-scene dataset demonstrate the effectiveness of the proposed methods.",

keywords = "Aggregated loss, Auxiliary duplicated target, Event-independent network, Multi-task learning, Sound event localization and detection",

author = "Jinbo Hu and Yin Cao and Ming Wu and Feiran Yang and Jun Yang",

year = "2025",

month = mar,

doi = "10.12395/0371-0025.2024361",

language = "Chinese (Simplified)",

volume = "50",

pages = "338--345",

journal = "Shengxue Xuebao/Acta Acustica",

issn = "0371-0025",

publisher = "Science Press",

number = "2",

}

TY - JOUR

T1 - 多任务学习框架下的声事件定位与检测损失函数设计

AU - Hu, Jinbo

AU - Cao, Yin

AU - Wu, Ming

AU - Yang, Feiran

AU - Yang, Jun

PY - 2025/3

Y1 - 2025/3

N2 - The track-wise multi-task learning approach exhibits significant efficacy in detecting overlapping sound sources for sound event localization and detection. However, as the number of predicted event classes increases, the track-wise multi-task networks often produce sparse outputs, resulting in missing alarms of sound events. To address this issue, this paper introduces an aggregated loss function, reformulating the multi-task learning framework into a single-task learning problem by coupling the activity of sound events with its Cartesian direction-of-arrival vector. Furthermore, considering the characteristics of the track-wise output format, auxiliary duplicated targets are introduced to optimize the system outputs by replicating events from active tracks into inactive ones. Experimental results on a large-scale synthetic test set with 170 event classes demonstrate that the proposed method significantly improves the performance in sound event detection, effectively reduces the missing alarm rate, and achieves substantial improvement in localization and trajectory tracking. Additionally, experimental results on the real-scene dataset demonstrate the effectiveness of the proposed methods.

AB - The track-wise multi-task learning approach exhibits significant efficacy in detecting overlapping sound sources for sound event localization and detection. However, as the number of predicted event classes increases, the track-wise multi-task networks often produce sparse outputs, resulting in missing alarms of sound events. To address this issue, this paper introduces an aggregated loss function, reformulating the multi-task learning framework into a single-task learning problem by coupling the activity of sound events with its Cartesian direction-of-arrival vector. Furthermore, considering the characteristics of the track-wise output format, auxiliary duplicated targets are introduced to optimize the system outputs by replicating events from active tracks into inactive ones. Experimental results on a large-scale synthetic test set with 170 event classes demonstrate that the proposed method significantly improves the performance in sound event detection, effectively reduces the missing alarm rate, and achieves substantial improvement in localization and trajectory tracking. Additionally, experimental results on the real-scene dataset demonstrate the effectiveness of the proposed methods.

KW - Aggregated loss

KW - Auxiliary duplicated target

KW - Event-independent network

KW - Multi-task learning

KW - Sound event localization and detection

UR - http://www.scopus.com/inward/record.url?scp=105001001289&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=105001001289&partnerID=8YFLogxK

U2 - 10.12395/0371-0025.2024361

DO - 10.12395/0371-0025.2024361

M3 - Article

AN - SCOPUS:105001001289

SN - 0371-0025

VL - 50

SP - 338

EP - 345

JO - Shengxue Xuebao/Acta Acustica

JF - Shengxue Xuebao/Acta Acustica

IS - 2

ER -

多任务学习框架下的声事件定位与检测损失函数设计

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Cite this