NWU Institutional Repository

Unsupervised acoustic model training: comparing South African English and isiZulu

Loading...
Thumbnail Image

Date

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Record Identifier

Abstract

Large amounts of untranscribed audio data are generated every day. These audio resources can be used to develop robust acoustic models that can be used in a variety of speech-based systems. Manually transcribing this data is resource intensive and requires funding, time and expertise. Lightly-supervised training techniques, however, provide a means to rapidly transcribe audio, thus reducing the initial resource investment to begin the modelling process. Our findings suggest that the lightly-supervised training technique works well for English but when moving to an agglutinative language, such as isiZulu, the process fails to achieve the performance seen for English. Additionally, phone-based performances are significantly worse when compared to an approach using word-based language models. These results indicate a strong dependence on large or well-matched text resources for lightly-supervised training techniques.

Sustainable Development Goals

Description

Citation

Neil Kleynhans, Febe de Wet and Etienne Barnard, “Unsupervised acoustic model training: comparing South African English and isiZulu”, in Proc. Annual Symp. Pattern Recognition Association of South Africa (PRASA), pp 136 - 141, Port Elizabeth, South Africa, 2015. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications]

Endorsement

Review

Supplemented By

Referenced By