The South African directory enquiries (SADE) name corpus

dc.contributor.author	Thirion, Jan Willem Frederick
dc.contributor.author	Van Heerden, Charl Johannes
dc.contributor.author	Giwa, Oluwapelumi
dc.contributor.author	Davel, Marelie Hattingh
dc.date.accessioned	2021-03-17T15:22:41Z
dc.date.available	2021-03-17T15:22:41Z
dc.date.issued	2020
dc.identifier.issn	1574-020X
dc.identifier.uri	http://hdl.handle.net/10394/36913
dc.description.abstract	We present the design and development of a South African directory enquiries (DE) corpus. It contains audio and orthographic transcriptions of a wide range of South African names produced by first language speakers of four languages, namely Afrikaans, English, isiZulu and Sesotho. Useful as a resource to understand the effect of name language and speaker language on pronunciation, this is the first corpus to also aim to identify the “intended language”: an implicit assumption with regard to word origin made by the speaker of the name. We describe the design, collection, annotation, and verification of the corpus. This includes an analysis of the algorithms used to tag the corpus with meta information that may be beneficial to pronunciation modelling tasks.	en_US
dc.language.iso	en	en_US
dc.publisher	Springer	en_US
dc.subject	Speech corpus collection	en_US
dc.subject	Pronounciation modeling	en_US
dc.subject	Speech recognition	en_US
dc.subject	Proper names	en_US
dc.title	The South African directory enquiries (SADE) name corpus	en_US
dc.type	Article	en_US