A Southern African corpus for multilingual name pronunciation
Abstract
We describe the challenges that arise in predicting
the pronunciations of proper names in a multilingual society.
In order to improve our understanding of this issue – which
is of significant practical importance for applications of speech
technology – we have designed and collected a multilingual
corpus of proper names. Both the names and the speakers
are drawn from four South African languages, namely isiZulu,
Sesotho, English and Afrikaans. We describe how the corpus was
designed in order to probe the interaction between the speaker’s
language and the origin of the name, and discuss the practical
steps that were taken in collecting the spoken utterances. A
statistical investigation of the prompt material reveals some of
the systematic differences between the languages.
URI
https://www.researchgate.net/publication/235425724_A_Southern_African_corpus_for_multilingual_name_pronunciationhttp://hdl.handle.net/10394/26543
Collections
- Faculty of Engineering [1123]