A distributed approach to speech resource collection
De Wet, Febe
MetadataShow full item record
We describe the integration of several tools to enable the end-to-end development of an Automatic Speech Recognition system in a typical under-resourced language. Google App Engine is employed as the core environment for data verification, storage and distribution, and used in conjunction with existing too ls for gathering text and for speech data recording. We analyse the data acquired by each of the tools and develop an ASR system in Shona, an important under-resourced language of Southern Africa. Although unexpected logistical problems complicated the process, we were able to collect a usable Shona speech corpus for the development of the first Automatic Speech Recognition system in that language.