Phonetic segmentation using statistical correction and multi-resolution fusion

Sixuan Zhao, Ing Yann Soon, Soo Ngee Koh, Kang Kwong Luke

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper focuses on the generation of accurate phonetic segmentations. Statistical methods based on absolute and relative correction are discussed and experimented on both monophone and biphone models to improve the segmentation results. The influence of search range on the statistical correction process is studied and a state selection technique is used to enhance the correction results. This paper also explores the influence of resolution (stepsize) of HMMs and proposes a multi-resolution fusion process to further refine the statistically corrected results. Improvements of segmentation results in terms of segmentation accuracy, mean absolute error (MAE), and root mean square error (RMSE) can be observed by applying the proposed refinement methods.

Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages6694-6698
Number of pages5
DOIs
Publication statusPublished - Oct 18 2013
Externally publishedYes
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: May 26 2013May 31 2013

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Country/TerritoryCanada
CityVancouver, BC
Period5/26/135/31/13

ASJC Scopus Subject Areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Keywords

  • multi-resolution
  • phonetic segmentation
  • state selection
  • statistical correction

Fingerprint

Dive into the research topics of 'Phonetic segmentation using statistical correction and multi-resolution fusion'. Together they form a unique fingerprint.

Cite this