G2P variant prediction techniques for ASR and STD

View/ Open
Date
2013Author
Davel, Marelie H.
van Heerden, Charl
Barnard, Etienne
Metadata
Show full item recordAbstract
Introducing pronunciation variants into a lexicon is a balancing
act: incorporating necessary variants can improve automatic
speech recognition (ASR) and spoken term detection (STD)
performance by capturing some of the variability that occurs
naturally; introducing superfluous variants can lead to increased
confusability and a decrease in performance. We experiment
with two very different grapheme-to-phoneme variant prediction
techniques and analyze the variants generated, as well as
their effect when used within fairly standard ASR and STD systems
with unweighted lexicons. Specifically, we compare the
variants generated by joint sequence models, which use probabilistic
information to generate as many or as few variants as
required, with a more discrete approach: the use of pseudophonemes
within the default-and-refine algorithm. We evaluate
results using three of the 2013 Babel evaluation languages
with quite different variant characteristics – Tagalog, Pashto and
Turkish – and find that there are clear trends in how the number
and type of variants influence performance, and that the implications
for lexicon creation for ASR and STD are different.
Index Terms: pronunciation variants, speech recognition, spoken
term detection, grapheme-to-phoneme
URI
http://www.isca-speech.org/archive/archive_papers/interspeech_2013/i13_1831.pdfhttp://hdl.handle.net/10394/26503
Collections
- Faculty of Engineering [1136]