NWU Institutional Repository

Multilingual pronunciations of proper names in a Southern African corpus

Loading...
Thumbnail Image

Date

Authors

Jan W.F. Thirion
Etienne Barnard
Davel, Marelie H.

Journal Title

Journal ISSN

Volume Title

Publisher

Pattern recognition association of South Africa (PRASA)

Abstract

We present our process for the development and analysis of a multilingual names corpus, called Multipron-split. It is derived from Multipron, a corpus collected in previous work [1], where names and speakers were drawn from four South African languages, namely Afrikaans, English, isiZulu and Sesotho. The new corpus is more suited for multilingual pronunciation modelling and research as the “words” consist of either a name or surname, rather than a combination of the two. This enables us to model pronunciations from a single language of origin, which has previously been shown to be important in pronunciation modelling for proper names. An algorithm is presented through which the most common pronunciations of names, also called reference pronunciations, can be automatically extracted from the observed pronunciations. We show that the most common pronunciation variants correlate well with the different speaker languages, and that systematic phone substitutions occur when speakers of one language pronounce names from a different language. Also, reasonably accurate automatic pronunciations can be generated with an automatic grapheme-to-phoneme converter, especially when the speaker language agrees with the name language

Description

Keywords

Citation

Thirion, J.W.F. & Davel, M.H., et al. 2012. Multilingual pronunciations of proper names in a Southern African corpus. Proceedings of the Twenty-Third Annual Symposium of the Pattern Recognition Association of South Africa. Pretoria. p. 102-108. [http://www.prasa.org/]

Endorsement

Review

Supplemented By

Referenced By