Augmenting telephony audio data using robust principal component analysis
Mo, Ronald and Lam, Albert Y.S. (2021) Augmenting telephony audio data using robust principal component analysis. 2020 IEEE Symposium Series on Computational Intelligence (SSCI).
- Restricted Access Documents
- Details
- 7013:127082
Audio augmentation (e.g., corrupting audio data by noise) has been shown to improve the performance of Automatic Speech Recognition (ASR) systems for low-resource languages. In light of this, we are interested in understanding whether corrupting speech data with telephone channel characteristics (e.g., background music, artifact caused by down-sampling) improves the performance of ASR systems as well. In this work, we investigate the possibility of applying Sound Source Separation (SSS) approaches to capture the telephone channel characteristics. We are in particular interested in Robust Principal Component Analysis (RPCA), which is an unsupervised approach used for various SSS tasks. Our results show that augmenting clean speech data corpus with telephone channel characteristics yields a more robust ASR system, with 7.8% of Word Error Rate reduction. We also find that the characteristic, which has the lowest spectral features, improves ASR the most.
Actions (login required)
![]() |
Edit View |
