The South African directory enquiries (SADE) name corpus
dc.contributor.author | Thirion, Jan Willem Frederick | |
dc.contributor.author | Van Heerden, Charl Johannes | |
dc.contributor.author | Giwa, Oluwapelumi | |
dc.contributor.author | Davel, Marelie Hattingh | |
dc.date.accessioned | 2021-03-17T15:22:41Z | |
dc.date.available | 2021-03-17T15:22:41Z | |
dc.date.issued | 2020 | |
dc.identifier.issn | 1574-020X | |
dc.identifier.uri | http://hdl.handle.net/10394/36913 | |
dc.description.abstract | We present the design and development of a South African directory enquiries (DE) corpus. It contains audio and orthographic transcriptions of a wide range of South African names produced by first language speakers of four languages, namely Afrikaans, English, isiZulu and Sesotho. Useful as a resource to understand the effect of name language and speaker language on pronunciation, this is the first corpus to also aim to identify the “intended language”: an implicit assumption with regard to word origin made by the speaker of the name. We describe the design, collection, annotation, and verification of the corpus. This includes an analysis of the algorithms used to tag the corpus with meta information that may be beneficial to pronunciation modelling tasks. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Springer | en_US |
dc.subject | Speech corpus collection | en_US |
dc.subject | Pronounciation modeling | en_US |
dc.subject | Speech recognition | en_US |
dc.subject | Proper names | en_US |
dc.title | The South African directory enquiries (SADE) name corpus | en_US |
dc.type | Article | en_US |
Files in this item
This item appears in the following Collection(s)
-
Faculty of Engineering [1122]