Multichannel Linear Prediction-Based Speech Dereverberation Considering Sparse and Low-Rank Priors

Taihui Wang; Feiran Yang; Jun Yang

doi:10.1109/TASLP.2024.3369535

Multichannel Linear Prediction-Based Speech Dereverberation Considering Sparse and Low-Rank Priors

Taihui Wang, Feiran Yang^*, Jun Yang

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

This article addresses the multi-channel linear prediction (MCLP)-based speech dereverberation problem by jointly considering the sparsity and low-rank priors of speech spectrograms. We utilize the complex generalized Gaussian (CGG) distribution as the source model and the generalized nonnegative matrix factorization (NMF) as the spectral model. The difference between the presented model and existing ones for MCLP is twofold. First, we adopt the CGG distribution with a time-frequency-variant scale parameter instead of that with a time-frequency-invariant scale parameter. Second, the time-frequency-varying scale parameter is approximated by NMF in a low-rank manner. Based on the maximum-likelihood criterion, speech dereverberation is formulated as an optimization problem that minimizes the prediction error weighted by the reciprocal of sparse and low-rank parameters. A convergence-guaranteed algorithm is derived to estimate the parameters using the majorization-minimization technology. The WPE, NMF-based WPE and CGG-based WPE can be treated as special cases of the proposed method with different shape and domain parameters. As a byproduct, the proposed method provides a simple and elegant way to derive the CGG-based WPE algorithm. A series of experiments show the superiority of the proposed method over WPE, NMF-based WPE and CGG-based WPE methods.

Original language	English
Pages (from-to)	1724-1735
Number of pages	12
Journal	IEEE/ACM Transactions on Audio Speech and Language Processing
Volume	32
DOIs	https://doi.org/10.1109/TASLP.2024.3369535
Publication status	Published - 2024
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2014 IEEE.

ASJC Scopus Subject Areas

Computer Science (miscellaneous)
Acoustics and Ultrasonics
Computational Mathematics
Electrical and Electronic Engineering

Keywords

complex generalized Gaussian
multichannel linear prediction
nonnegative matrix factorization
Speech dereverberation
weighted prediction error

Access to Document

10.1109/TASLP.2024.3369535

Cite this

@article{59af045dcc2246cb893880fd58b0583b,

title = "Multichannel Linear Prediction-Based Speech Dereverberation Considering Sparse and Low-Rank Priors",

abstract = "This article addresses the multi-channel linear prediction (MCLP)-based speech dereverberation problem by jointly considering the sparsity and low-rank priors of speech spectrograms. We utilize the complex generalized Gaussian (CGG) distribution as the source model and the generalized nonnegative matrix factorization (NMF) as the spectral model. The difference between the presented model and existing ones for MCLP is twofold. First, we adopt the CGG distribution with a time-frequency-variant scale parameter instead of that with a time-frequency-invariant scale parameter. Second, the time-frequency-varying scale parameter is approximated by NMF in a low-rank manner. Based on the maximum-likelihood criterion, speech dereverberation is formulated as an optimization problem that minimizes the prediction error weighted by the reciprocal of sparse and low-rank parameters. A convergence-guaranteed algorithm is derived to estimate the parameters using the majorization-minimization technology. The WPE, NMF-based WPE and CGG-based WPE can be treated as special cases of the proposed method with different shape and domain parameters. As a byproduct, the proposed method provides a simple and elegant way to derive the CGG-based WPE algorithm. A series of experiments show the superiority of the proposed method over WPE, NMF-based WPE and CGG-based WPE methods.",

keywords = "complex generalized Gaussian, multichannel linear prediction, nonnegative matrix factorization, Speech dereverberation, weighted prediction error",

author = "Taihui Wang and Feiran Yang and Jun Yang",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2024",

doi = "10.1109/TASLP.2024.3369535",

language = "English",

volume = "32",

pages = "1724--1735",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "IEEE Advancing Technology for Humanity",

}

TY - JOUR

T1 - Multichannel Linear Prediction-Based Speech Dereverberation Considering Sparse and Low-Rank Priors

AU - Wang, Taihui

AU - Yang, Feiran

AU - Yang, Jun

PY - 2024

Y1 - 2024

N2 - This article addresses the multi-channel linear prediction (MCLP)-based speech dereverberation problem by jointly considering the sparsity and low-rank priors of speech spectrograms. We utilize the complex generalized Gaussian (CGG) distribution as the source model and the generalized nonnegative matrix factorization (NMF) as the spectral model. The difference between the presented model and existing ones for MCLP is twofold. First, we adopt the CGG distribution with a time-frequency-variant scale parameter instead of that with a time-frequency-invariant scale parameter. Second, the time-frequency-varying scale parameter is approximated by NMF in a low-rank manner. Based on the maximum-likelihood criterion, speech dereverberation is formulated as an optimization problem that minimizes the prediction error weighted by the reciprocal of sparse and low-rank parameters. A convergence-guaranteed algorithm is derived to estimate the parameters using the majorization-minimization technology. The WPE, NMF-based WPE and CGG-based WPE can be treated as special cases of the proposed method with different shape and domain parameters. As a byproduct, the proposed method provides a simple and elegant way to derive the CGG-based WPE algorithm. A series of experiments show the superiority of the proposed method over WPE, NMF-based WPE and CGG-based WPE methods.

AB - This article addresses the multi-channel linear prediction (MCLP)-based speech dereverberation problem by jointly considering the sparsity and low-rank priors of speech spectrograms. We utilize the complex generalized Gaussian (CGG) distribution as the source model and the generalized nonnegative matrix factorization (NMF) as the spectral model. The difference between the presented model and existing ones for MCLP is twofold. First, we adopt the CGG distribution with a time-frequency-variant scale parameter instead of that with a time-frequency-invariant scale parameter. Second, the time-frequency-varying scale parameter is approximated by NMF in a low-rank manner. Based on the maximum-likelihood criterion, speech dereverberation is formulated as an optimization problem that minimizes the prediction error weighted by the reciprocal of sparse and low-rank parameters. A convergence-guaranteed algorithm is derived to estimate the parameters using the majorization-minimization technology. The WPE, NMF-based WPE and CGG-based WPE can be treated as special cases of the proposed method with different shape and domain parameters. As a byproduct, the proposed method provides a simple and elegant way to derive the CGG-based WPE algorithm. A series of experiments show the superiority of the proposed method over WPE, NMF-based WPE and CGG-based WPE methods.

KW - complex generalized Gaussian

KW - multichannel linear prediction

KW - nonnegative matrix factorization

KW - Speech dereverberation

KW - weighted prediction error

UR - http://www.scopus.com/inward/record.url?scp=85186090149&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85186090149&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2024.3369535

DO - 10.1109/TASLP.2024.3369535

M3 - Article

AN - SCOPUS:85186090149

SN - 2329-9290

VL - 32

SP - 1724

EP - 1735

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

ER -

Multichannel Linear Prediction-Based Speech Dereverberation Considering Sparse and Low-Rank Priors

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Cite this