Rapid development of TTS corpora for four South African languages
Loading...
Date
Authors
Van Niekerk, Daniel R.
van Heerden, Charl
Kleynhans, Neil
Kjartansson, Oddur
Jansche, Martin
Ha, Linne
Davel, Marelie H.
Journal Title
Journal ISSN
Volume Title
Publisher
Interspeech 2017
Abstract
This paper describes the development of text-to-speech corpora
for four South African languages. The approach followed investigated
the possibility of using low-cost methods including informal
recording environments and untrained volunteer speakers.
This objective and the additional future goal of expanding
the corpus to increase coverage of South Africa’s 11 official
languages necessitated experimenting with multi-speaker and
code-switched data. The process and relevant observations are
detailed throughout. The latest version of the corpora are available
for download under an open-source license and will likely
see further development and refinement in future.
Index Terms: text-to-speech corpus, under-resourced languages
Description
Citation
Daniel Rudolph van Niekerk, Charl van Heerden, Marelie Davel, Neil Kleynhans, Oddur Kjartansson, Martin Jansche and Linne Ha, “Rapid development of TTS corpora for four South African languages”, in Proc. Interspeech, pp 2178-2182, Stockholm, Sweden, 2017. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications]