G2P variant prediction techniques for ASR and STD
Loading...
Date
Authors
Davel, Marelie H.
van Heerden, Charl
Barnard, Etienne
Journal Title
Journal ISSN
Volume Title
Publisher
Interspeech 2013
Abstract
Introducing pronunciation variants into a lexicon is a balancing
act: incorporating necessary variants can improve automatic
speech recognition (ASR) and spoken term detection (STD)
performance by capturing some of the variability that occurs
naturally; introducing superfluous variants can lead to increased
confusability and a decrease in performance. We experiment
with two very different grapheme-to-phoneme variant prediction
techniques and analyze the variants generated, as well as
their effect when used within fairly standard ASR and STD systems
with unweighted lexicons. Specifically, we compare the
variants generated by joint sequence models, which use probabilistic
information to generate as many or as few variants as
required, with a more discrete approach: the use of pseudophonemes
within the default-and-refine algorithm. We evaluate
results using three of the 2013 Babel evaluation languages
with quite different variant characteristics – Tagalog, Pashto and
Turkish – and find that there are clear trends in how the number
and type of variants influence performance, and that the implications
for lexicon creation for ASR and STD are different.
Index Terms: pronunciation variants, speech recognition, spoken
term detection, grapheme-to-phoneme
Description
Citation
Marelie Davel, Charl van Heerden, and Etienne Barnard. “G2P variant prediction techniques for ASR and STD”, in Proc. Interspeech, pp 1831-1835, Lyon, France, 2013. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications]