NWU Institutional Repository

The semi-automated creation of stratified speech corpora

dc.contributor.authorVan Heerden, Carel
dc.contributor.authorBarnard, Etienne
dc.contributor.authorDavel, Marelie H.
dc.contributor.researchID11539151 - Van Heerden, Carel Jacobus
dc.contributor.researchID23607955 - Davel, Marelie Hattingh
dc.contributor.researchID21021287 - Barnard, Etienne
dc.date.accessioned2014-11-03T13:59:55Z
dc.date.available2014-11-03T13:59:55Z
dc.date.issued2013
dc.description.abstractSmartphones provide an efficient means for the collection of speech data; however, the quality of the corpora created in this fashion is not predictable. We describe an approach that allows us to post-process and rank utterances in a prompted speech corpus quickly and effectively. Utterance ranking makes it possible to both select those utterances with the highest likelihood of being correct and to evaluate the quality of the resulting corpus from a limited sample. This approach has been applied to a collection in the eleven official languages of South Africa, and we show that it naturally leads to the creation of stratified corpora from the same collection. Such corpora can be useful for different purposes, and corpus users are provided with the tools to extract these easily: from small, highly accurate corpora to larger corpora that are likely to contain more errorsen_US
dc.description.urihttp://www.prasa.org/index.php/2012-03-07-10-55-15
dc.identifier.citationVan Heerden, C. & Davel, M.H., et al. 2013. The semi-automated creation of stratified speech corpora. In: Conference Proceedings of the 24th Annual Symposium of the Pattern Recognition Association of South Africa. Pretoria. p. 115-119. [http://www.prasa.org/]en_US
dc.identifier.isbn978-0-86970-771-5
dc.identifier.urihttp://hdl.handle.net/10394/12120
dc.language.isoenen_US
dc.publisherPattern recognition association of South Africa (PRASA)en_US
dc.subjectSpeech corporaen_US
dc.subjectAutomatic speech recognitionen_US
dc.subjectConfidence scoringen_US
dc.titleThe semi-automated creation of stratified speech corporaen_US
dc.typeOtheren_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
prasa2013-17.pdf
Size:
195.75 KB
Format:
Adobe Portable Document Format
Description:
The semi-automated creation of stratified speech corpora

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: