NWU Institutional Repository

Speech data collection in an under-resourced language within a multilingual context

dc.contributor.authorMolapo, Raymond
dc.contributor.authorBarnard, Etienne
dc.contributor.authorde Wet, Febe
dc.contributor.researchID21021287 - Barnard, Etienne
dc.date.accessioned2016-05-19T12:52:58Z
dc.date.available2016-05-19T12:52:58Z
dc.date.issued2014
dc.description.abstractIn this paper, we present an end-to-end solution to the development of an automatic speech recognition (ASR) system in typical under-resourced languages, where the target language is likely to be influenced by one more embedded foreign languages. We first describe the collection and processing of the text corpus crawled from the World Wide Web using the Rapid Language Adaptation Toolkit. In particular, we highlight the challenges faced when foreign languages are embedded within the matrix language. Thereafter, we discuss our speech data collection efforts in under-resourced environments. We finally report on a strategy called transliteration that aids to improve recognition results of our grapheme-based automatic speech recognition system in the presence of embedded language words.en_US
dc.description.urihttp://www.mica.edu.vn/sltu2014/
dc.description.urihttp://mica.edu.vn/sltu2014/proceedings/35.pdf
dc.identifier.citationMolapo, R. et al. 2014. Speech data collection in an under-resourced language within a multilingual context. (In: 4th International Workshop on Spoken Language Technologies for Under-resourced Languages, St Petersburg, Russia, 14-16 May. p. 238-242).en_US
dc.identifier.isbn978-5-8088-0908-6.
dc.identifier.urihttp://hdl.handle.net/10394/17362
dc.language.isoenen_US
dc.publisherInternational Research Institute MICAen_US
dc.subjectUnder-resourced languagesen_US
dc.subjectMatrix languageen_US
dc.subjectTransliterationen_US
dc.subjectGrapheme-based ASRen_US
dc.titleSpeech data collection in an under-resourced language within a multilingual contexten_US
dc.typePresentationen_US

Files

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: