Unsupervised acoustic model training: comparing South African English and isiZulu
Loading...
Date
Researcher ID
Supervisors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Record Identifier
Abstract
Large amounts of untranscribed audio data are generated
every day. These audio resources can be used to develop robust
acoustic models that can be used in a variety of speech-based
systems. Manually transcribing this data is resource intensive
and requires funding, time and expertise. Lightly-supervised
training techniques, however, provide a means to rapidly transcribe
audio, thus reducing the initial resource investment to
begin the modelling process.
Our findings suggest that the lightly-supervised training
technique works well for English but when moving to an agglutinative
language, such as isiZulu, the process fails to achieve
the performance seen for English. Additionally, phone-based
performances are significantly worse when compared to an approach
using word-based language models. These results indicate
a strong dependence on large or well-matched text resources
for lightly-supervised training techniques.
Sustainable Development Goals
Description
Citation
Neil Kleynhans, Febe de Wet and Etienne Barnard, “Unsupervised acoustic model training: comparing South African English and isiZulu”, in Proc. Annual Symp. Pattern Recognition Association of South Africa (PRASA), pp 136 - 141, Port Elizabeth, South Africa, 2015. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications]
