NWU Institutional Repository

Multilingual pronunciations of proper names in a Southern African corpus

dc.contributor.authorJan W.F. Thirion
dc.contributor.authorEtienne Barnard
dc.contributor.authorDavel, Marelie H.
dc.contributor.researchID23607955 - Davel, Marelie Hattingh
dc.contributor.researchID21021287 - Barnard, Etienne
dc.date.accessioned2014-11-04T06:04:33Z
dc.date.available2014-11-04T06:04:33Z
dc.date.issued2012
dc.description.abstractWe present our process for the development and analysis of a multilingual names corpus, called Multipron-split. It is derived from Multipron, a corpus collected in previous work [1], where names and speakers were drawn from four South African languages, namely Afrikaans, English, isiZulu and Sesotho. The new corpus is more suited for multilingual pronunciation modelling and research as the “words” consist of either a name or surname, rather than a combination of the two. This enables us to model pronunciations from a single language of origin, which has previously been shown to be important in pronunciation modelling for proper names. An algorithm is presented through which the most common pronunciations of names, also called reference pronunciations, can be automatically extracted from the observed pronunciations. We show that the most common pronunciation variants correlate well with the different speaker languages, and that systematic phone substitutions occur when speakers of one language pronounce names from a different language. Also, reasonably accurate automatic pronunciations can be generated with an automatic grapheme-to-phoneme converter, especially when the speaker language agrees with the name languageen_US
dc.description.urihttp://www.prasa.org/index.php/2012-03-07-10-55-15
dc.identifier.citationThirion, J.W.F. & Davel, M.H., et al. 2012. Multilingual pronunciations of proper names in a Southern African corpus. Proceedings of the Twenty-Third Annual Symposium of the Pattern Recognition Association of South Africa. Pretoria. p. 102-108. [http://www.prasa.org/]en_US
dc.identifier.isbn978-0-620-54601-0
dc.identifier.urihttp://hdl.handle.net/10394/12125
dc.language.isoenen_US
dc.publisherPattern recognition association of South Africa (PRASA)en_US
dc.titleMultilingual pronunciations of proper names in a Southern African corpusen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
prasa2012-17.pdf
Size:
166 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: