NWU Institutional Repository

Automatic speech recognition of poor quality audio using generative adversarial networks

dc.contributor.advisorDavel, M.H.
dc.contributor.authorHeymans, Walter
dc.contributor.researchID23607955 - Davel Marelie Hattingh (Supervisor)
dc.date.accessioned2022-07-19T06:48:39Z
dc.date.available2022-07-19T06:48:39Z
dc.date.issued2022
dc.descriptionMEng (Computer and Electronic Engineering), North-West University, Potchefstroom Campusen_US
dc.description.abstractIn this study, we investigate the use of generative adversarial networks (GANs) to improve speech recognition performance of poor quality audio obtained from a real-world source. A GAN is developed to transform acoustic features of noisy audio prior to downstream acoustic modelling. The system utilises a baseline acoustic model trained on good quality data to improve the performance on mismatched data. This is achieved without requiring manual creation of parallel datasets. The practical relevance of the GAN is realised when a strong commercial-grade speech recognition system { which has already been optimised for a given set of conditions { is required to decode new mismatched data. The GAN can then act as a front-end to the existing system. We compare the GAN-based front-end to multi-style training (MTR) on three datasets in a controlled environment. The GAN system is much faster to train than a comparable MTR system with similar performance. The developed GAN is applied to a South African call centre dataset and achieves consistent improvements over a baseline model. Therefore, this provides a practical approach to improve ASR systems in mismatched environments.en_US
dc.description.thesistypeMastersen_US
dc.identifier.urihttps://orcid.org.0000-0003-2375-2371
dc.identifier.urihttp://hdl.handle.net/10394/39346
dc.language.isoenen_US
dc.publisherNorth-West University (South Africa).en_US
dc.subjectAutomatic speech recognitionen_US
dc.subjectGenerative adversarial networksen_US
dc.subjectMulti-style trainingen_US
dc.subjectCall centre audioen_US
dc.subjectWAV49 encodingen_US
dc.titleAutomatic speech recognition of poor quality audio using generative adversarial networksen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Heymans W Final.pdf
Size:
3.58 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections