Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems

De Wet, Febe; Kleynhans, Neil Taylor; Van Compernolle, Dirk; Reza, Sahraeian

Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems

dc.contributor.author	De Wet, Febe
dc.contributor.author	Kleynhans, Neil Taylor
dc.contributor.author	Van Compernolle, Dirk
dc.contributor.author	Reza, Sahraeian
dc.date.accessioned	2018-02-26T13:46:34Z
dc.date.available	2018-02-26T13:46:34Z
dc.date.issued	2017
dc.description.abstract	For purposes of automated speech recognition in under-resourced environments, techniques used to share acoustic data between closely related or similar languages become important. Donor languages with abundant resources can potentially be used to increase the recognition accuracy of speech systems developed in the resource poor target language. The assumption is that adding more data will increase the robustness of the statistical estimations captured by the acoustic models. In this study we investigated data sharing between Afrikaans and Flemish - an under-resourced and well-resourced language, respectively. Our approach was focused on the exploration of model adaptation and refinement techniques associated with hidden Markov model based speech recognition systems to improve the benefit of sharing data. Specifically, we focused on the use of currently available techniques, some possible combinations and the exact utilisation of the techniques during the acoustic model development process. Our findings show that simply using normal approaches to adaptation and refinement does not result in any benefits when adding Flemish data to the Afrikaans training pool. The only observed improvement was achieved when developing acoustic models on all available data but estimating model refinements and adaptations on the target data only. Significance: - Acoustic modelling for under-resourced languages - Automatic speech recognition for Afrikaans - Data sharing between Flemish and Afrikaans to improve acoustic modelling for Afrikaans	en_US
dc.description.sponsorship	This research was supported by the South African National Research Foundation (grant no. UID73933), the Fund for Scientific Research of Flanders (FWO) under project AMODA (GA122.10N) as well as a grant from the joint Programme of Collaboration on HLT funded by the Nederlandse Taalunie and the South African Department of Arts and Culture.	en_US
dc.identifier.citation	Febe de Wet, Neil Kleynhans, Dirk van Compernolle and Reza Sahraeian, “Speech recognition for under-resourced languages: data sharing in hidden Markov model systems”, South African Journal of Science, Vol 113, No 1/2, pp 25-33, January 2017 [http://engineering.nwu.ac.za/sites/engineering.nwu.ac.za/files/files/v-must/Publications/Publications%202017/dewet-2017-SpeechRecognition.pdf]	en_US
dc.identifier.uri	http://hdl.handle.net/10394/26438
dc.identifier.uri	http://ieeexplore.ieee.org/document/7707303/
dc.identifier.uri	http://www.scielo.org.za/scielo.php?script=sci_arttext&pid=S0038-23532017000100009
dc.language.iso	en	en_US
dc.publisher	South African Journal of Science	en_US
dc.subject	acoustic modelling	en_US
dc.subject	Afrikaans	en_US
dc.subject	Flemish	en_US
dc.subject	automatic speech recognition	en_US
dc.title	Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: dewet-2017-speech-recognition.pdf
Size:: 811.22 KB
Format:: Adobe Portable Document Format
Description:: dewet-2017-speech-recognition

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Faculty of Engineering