Exploring minimal pronunciation modeling for low resource languages
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
IOS Press Inc
Abstract
Pronunciation lexicons can range from fully graphemic (modeling
each word using the orthography directly) to fully phonemic
(first mapping each word to a phoneme string). Between these
two options lies a continuum of modeling options. We analyze
techniques that can improve the accuracy of a graphemic system
without requiring significant effort to design or implement.
The analysis is performed in the context of the IARPA Babel
project, which aims to develop spoken term detection systems
for previously unseen languages rapidly, and with minimal human
effort. We consider techniques related to letter-to-sound
mapping and language-independent syllabification of primarily
graphemic systems, and discuss results obtained for six languages:
Cebuano, Kazakh, Kurmanji Kurdish, Lithuanian, Telugu
and Tok Pisin.
Description
Citation
Marelie Davel, Damianos Karakos, Etienne Barnard, Charl van Heerden, Richard Schwartz and Stavros Tsakalidis, William Hartmann, “Exploring minimal pronunciation modeling for low resource languages”, in Proc. Interspeech, pp 538-542, Dresden, Germany, 2015. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications]
URI
https://books.google.co.za/books?id=-RGhDQAAQBAJ&pg=PA44&lpg=PA44&dq=Exploring+minimal+pronunciation+modeling+for+low+resource+languages&source=bl&ots=wAYDYAm_Ju&sig=ha5BMCtwoEBjHQTAkyauz2wSSEc&hl=en&sa=X&ved=0ahUKEwjFwPDv1M3ZAhUlKsAKHXrICPkQ6AEIODAC#v=onepage&q=Exploring%20minimal%20pronunciation%20modeling%20for%20low%20resource%20languages&f=false
https://www.lti.cs.cmu.edu/sites/default/files/sitaram%2C%20sunayana.pdf
http://hdl.handle.net/10394/26488
https://www.lti.cs.cmu.edu/sites/default/files/sitaram%2C%20sunayana.pdf
http://hdl.handle.net/10394/26488