Augmenting telephony audio data using robust principal component analysis

Mo, Ronald; Lam, Albert Y.S.

Augmenting telephony audio data using robust principal component analysis

Mo, Ronald and Lam, Albert Y.S. (2021) Augmenting telephony audio data using robust principal component analysis. 2020 IEEE Symposium Series on Computational Intelligence (SSCI).

Restricted Access Documents
Details

[A][B][+][-]

Restricted Access Documents

7013:127082

[A][B][+][-]

7013:127082

[thumbnail of 2020 - Augmenting Telephony Audio Data using Robust Principal Component Analysis.pdf]

Augmenting telephony audio data using robust principal component analysis (528kB)

Details

Creators:

Mo, Ronald and Lam, Albert Y.S.

Description/Abstract:

Audio augmentation (e.g., corrupting audio data by noise) has been shown to improve the performance of Automatic Speech Recognition (ASR) systems for low-resource languages. In light of this, we are interested in understanding whether corrupting speech data with telephone channel characteristics (e.g., background music, artifact caused by down-sampling) improves the performance of ASR systems as well. In this work, we investigate the possibility of applying Sound Source Separation (SSS) approaches to capture the telephone channel characteristics. We are in particular interested in Robust Principal Component Analysis (RPCA), which is an unsupervised approach used for various SSS tasks. Our results show that augmenting clean speech data corpus with telephone channel characteristics yields a more robust ASR system, with 7.8% of Word Error Rate reduction. We also find that the characteristic, which has the lowest spectral features, improves ASR the most.

Item Type:

Article

Official URL:

http://dx.doi.org/10.1109/SSCI47803.2020.9308406

Date:

5 January 2021

Identification Number:

10.1109/SSCI47803.2020.9308406

Uncontrolled Keywords or tags :

audio, augmentation, automatic speech recognition

Schools:

School of Games & Creative Technology