Data sufficiency analysis for automatic speech recognition

Badenhorst, Jacob Andreas Cornelius

Data sufficiency analysis for automatic speech recognition

Files

badenhorst_jacobac.pdf (1.08 MB)

Date

2009

Authors

Badenhorst, Jacob Andreas Cornelius

Supervisors

Barnard, E.
Davel, M.H.

Publisher

North-West University

Abstract

The languages spoken in developing countries are diverse and most are currently under-resourced from an automatic speech recognition (ASR) perspective. In South Africa alone, 10 of the 11 official languages belong to this category. Given the potential for future applications of speech-based information systems such as spoken dialog system (SDSs) in these countries, the design of minimal ASR audio corpora is an important research area. Specifically, current ASR systems utilise acoustic models to represent acoustic variability, and effective ASR corpus design aims to optimise the amount of relevant variation within training data while minimising the size of the corpus. Therefore an investigation of the effect that different amounts and types of training data have on these models is needed. With this dissertation specific consideration is given to the data sufficiency principals that apply to the training of acoustic models. The investigation of this task lead to the following main achievements: 1) We define a new stability measurement protocol that provides the capability to view the variability of ASR training data. 2) This protocol allows for the investigation of the effect that various acoustic model complexities and ASR normalisation techniques have on ASR training data requirements. Specific trends with regard to the data requirements for different phone categories and how these are affected by various modelling strategies are observed. 3) Based on this analysis acoustic distances between phones are estimated across language borders, paving the way for further research in cross-language data sharing. Finally the knowledge obtained from these experiments is applied to perform a data sufficiency analysis of a new speech recognition corpus of South African languages: The Lwazi ASR corpus. The findings correlate well with initial phone recognition results and yield insight into the sufficient number of speakers required for the development of minimal telephone ASR corpora.

Description

Thesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2009.

Keywords

Speech recognition, Acoustic variability, Corpus design, Resource-scarce languages, Acoustic models, Model distances, Telephone ASR corpora

URI

http://hdl.handle.net/10394/3994

Collections

Engineering

Full item page

Data sufficiency analysis for automatic speech recognition

Files

Date

Authors

Researcher ID

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Record Identifier

Abstract

Sustainable Development Goals

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By