Automatic speech recognition for resource–scarce environments

Kleynhans, Neil Taylor

Automatic speech recognition for resource–scarce environments

dc.contributor.advisor	Barnard, E.
dc.contributor.author	Kleynhans, Neil Taylor
dc.date.accessioned	2013-12-03T06:14:18Z
dc.date.available	2013-12-03T06:14:18Z
dc.date.issued	2013
dc.description	Thesis (PhD (Computer and Electronic Engineering))--North-West University, Potchefstroom Campus, 2013.
dc.description.abstract	Automatic speech recognition (ASR) technology has matured over the past few decades and has made significant impacts in a variety of fields, from assistive technologies to commercial products. However, ASR system development is a resource intensive activity and requires language resources in the form of text annotated audio recordings and pronunciation dictionaries. Unfortunately, many languages found in the developing world fall into the resource-scarce category and due to this resource scarcity the deployment of ASR systems in the developing world is severely inhibited. In this thesis we present research into developing techniques and tools to (1) harvest audio data, (2) rapidly adapt ASR systems and (3) select "useful" training samples in order to assist with resource-scarce ASR system development. We demonstrate an automatic audio harvesting approach which efficiently creates a speech recognition corpus by harvesting an easily available audio resource. We show that by starting with bootstrapped acoustic models, trained with language data obtain from a dialect, and then running through a few iterations of an alignment-filter-retrain phase it is possible to create an accurate speech recognition corpus. As a demonstration we create a South African English speech recognition corpus by using our approach and harvesting an internet website which provides audio and approximate transcriptions. The acoustic models developed from harvested data are evaluated on independent corpora and show that the proposed harvesting approach provides a robust means to create ASR resources. As there are many acoustic model adaptation techniques which can be implemented by an ASR system developer it becomes a costly endeavour to select the best adaptation technique. We investigate the dependence of the adaptation data amount and various adaptation techniques by systematically varying the adaptation data amount and comparing the performance of various adaptation techniques. We establish a guideline which can be used by an ASR developer to chose the best adaptation technique given a size constraint on the adaptation data, for the scenario where adaptation between narrow- and wide-band corpora must be performed. In addition, we investigate the effectiveness of a novel channel normalisation technique and compare the performance with standard normalisation and adaptation techniques. Lastly, we propose a new data selection framework which can be used to design a speech recognition corpus. We show for limited data sets, independent of language and bandwidth, the most effective strategy for data selection is frequency-matched selection and that the widely-used maximum entropy methods generally produced the least promising results. In our model, the frequency-matched selection method corresponds to a logarithmic relationship between accuracy and corpus size; we also investigated other model relationships, and found that a hyperbolic relationship (as suggested from simple asymptotic arguments in learning theory) may lead to somewhat better performance under certain conditions.	en_US
dc.description.thesistype	Doctoral	en_US
dc.identifier.uri	http://hdl.handle.net/10394/9668
dc.language.iso	en	en_US
dc.publisher	North-West University
dc.subject	Automatic speech recognition	en_US
dc.subject	data harvesting	en_US
dc.subject	acoustic model adaptation	en_US
dc.subject	feature normalisation	en_US
dc.subject	data selection	en_US
dc.subject	corpus design	en_US
dc.subject	resource-scarce	en_US
dc.subject	language technology resource development	en_US
dc.title	Automatic speech recognition for resource–scarce environments	en
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Kleynhans_NT.pdf
Size:: 1.22 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Engineering