Automatic Recognition of Code-Switched Speech in Sepedi
Modipa, Thipe Isaaih
MetadataShow full item record
Code switching (CS) is a natural phenomenon that is often observed in multilingual speakers. These speakers use words, phrases or sentences from foreign languages and embed them in sentences in the primary language. Automatic speech recognition (ASR) systems find code-switched speech difficult to process, and ASR performance is known to degrade in CS environments. We study the Sepedi/English CS phenomenon in the context of Sepedi ASR. Using experimentation, data collection and quantitative data analysis, we analyse techniques that can be used to effectively model code-switched speech in resource-scarce environments. The focus is on techniques that modify the pronunciation dictionary, in order to improve recognition accuracy. For this purpose, three new speech resources are designed, collected and curated: (1) the Radio Broadcast Corpus contains real examples of code-switching as observed during radio broadcasts; (2) the Sepedi Prompted Code-Switched (SPCS) Corpus is based on true code switching prompts, with each individual prompt recorded by multiple speakers in order to capture pronunciation variability occurring in code-switched speech; and (3) the National Center for Human Language Technology (NCHLT) Sepedi-English code switched subset (NSECSS) corpus does not contain naturally occurring code-switched speech, but rather English as spoken by Sepedi speakers. The latter corpus is particularly useful as its recording conditions and format match two related corpora: English produced by English speakers and Sepedi produced by Sepedi speakers. As part of corpus development, resource collection and analysis tools were developed and evaluated. Utilising these corpora, the implications of code-switched speech for ASR systems were evaluated. Various approaches to pronunciation modelling of code-switched speech were investigated and a novel method for pronunciation prediction developed. This new variant selection approach to modelling code-switched speech requires a two-step process: after grapheme-to-phoneme prediction of foreign words, phoneme-to-phoneme prediction (mapping the foreign phonemes to in-language phonemes) does not only take phoneme identity into account, but also graphemic context. A practical implementation of such an algorithm performed well during recognition experiments, both as a single approach and in combination with other existing approaches. The best overall results were obtained when multiple variants were generated per CS word, and variant-selection included in this process. Even though specifically applied to the Sepedi/English task, the methods themselves are language-independent. In addition, the methods, frequency of and reasons for code switching observed among Sepedi speakers were studied using corpus analysis. Among other results, it was found that the prevalence of code switching within naturally occurring Sepedi speech was much higher than initially anticipated, making this a task well worth studying.