• Login
    View Item 
    •   NWU-IR Home
    • Research Output
    • Faculty of Engineering
    • View Item
    •   NWU-IR Home
    • Research Output
    • Faculty of Engineering
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Efficient harvesting of Internet audio for resource-scarce ASR

    Thumbnail
    View/Open
    davel-2011-efficient-harvesting (167.8Kb)
    Date
    2011
    Author
    Davel, Marelie H.
    van Heerden, Charl
    Kleynhans, Neil
    Barnard, Etienne
    Metadata
    Show full item record
    Abstract
    Spoken recordings that have been transcribed for human reading (e.g. as captions for audiovisual material, or to provide alternative modes of access to recordings) are widely available in many languages. Such recordings and transcriptions have proven to be a valuable source of ASR data in well-resourced languages, but have not been exploited to a significant extent in under-resourced languages or dialects. Techniques used to harvest such data typically assume the availability of a fairly accurate ASR system, which is generally not available when working with resourcescarce languages. In this work, we define a process whereby an ASR corpus is bootstrapped using unmatched ASR models in conjunction with speech and approximate transcriptions sourced from the Internet. We introduce a new segmentation technique based on the use of a phone-internal garbage model, and demonstrate how this technique (combined with limited filtering) can be used to develop a large, high-quality corpus in an underresourced dialect with minimal effort.
    URI
    https://researchspace.csir.co.za/dspace/bitstream/handle/10204/5769/Davel_2011.pdf?sequence=1&isAllowed=y
    http://hdl.handle.net/10394/26541
    Collections
    • Faculty of Engineering [1136]

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of NWU-IR Communities & CollectionsBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis TypeThis CollectionBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis Type

    My Account

    LoginRegister

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV