• Login
    View Item 
    •   NWU-IR Home
    • Electronic Theses and Dissertations (ETDs)
    • Engineering
    • View Item
    •   NWU-IR Home
    • Electronic Theses and Dissertations (ETDs)
    • Engineering
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Trajectory modelling with limited speech data

    Thumbnail
    View/Open
    Badenhorst_JAC_2016.pdf (3.559Mb)
    Date
    2016
    Author
    Badenhorst, Jacob Andreas Cornelius
    Metadata
    Show full item record
    Abstract
    State-of-the-art automatic speech recognition (ASR) systems are built using hundreds or even thousands of hours of speech data. Even then, high recognition accuracy is achievable only by carefully constraining the recognition domain. This reliance on large speech corpora remains a major challenge when building ASR systems for resource constrained languages. The need for large corpora is partially due to the substantial variation observed in different spoken realisations of the same text but to significantly to co-articulation plays an important role. When building an ASR system, it is not sufficient to observe a large number of samples of each acoustic unit during training; it is necessary to observe sufficient samples appearing in similar contexts to those found in the test data. To obtain a better understanding of co-articulation effects, we analysed the behaviour of phones in context, using trajectory models. We developed a new model that captures the feature trajectories of acoustic unit transitions directly, and developed a way of representing the characteristic changes between different units. We found it beneficial to model these characteristic changes at the spectral rather than cepstral level, by extracting features directly from the filter bank. Applying auto-regressive moving-average (ARMA) filtering to smooth spectral energies before constructing cepstral features also improved the accuracy of trajectories. We experimented with different approaches to identify transition model alignments and selected techniques that allowed us to locate the characteristic changes between units with the required accuracy. We developed a new compact representation of speech units in context, estimating model parameters using the trajectory models. These models function at a sub-transitional level, enabling the construction of units that occur in unseen and rare contexts. Applying this technique, it was possible to create synthetic samples of triphone contexts, by first constructing diphone transitions and concatenating these to form synthetic trajectories. We found that better acoustic models (producing higher likelihoods on unseen test data) could be developed by augmenting existing data with synthetic samples. When the samples were used to augment the training data in an end-to-end ASR system, promising results were obtained. A useful side effect is that the synthetic samples provide a new mechanism to improve cluster selection for unseen or rare phones during state-tying.
    URI
    http://hdl.handle.net/10394/17839
    Collections
    • Engineering [1343]

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of NWU-IR Communities & CollectionsBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis TypeThis CollectionBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis Type

    My Account

    LoginRegister

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV