• Login
    View Item 
    •   NWU-IR Home
    • Research Output
    • Faculty of Natural and Agricultural Sciences
    • View Item
    •   NWU-IR Home
    • Research Output
    • Faculty of Natural and Agricultural Sciences
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Pre-training a Transformer-Based Generative Model Using a Small Sepedi Dataset

    Thumbnail
    View/Open
    Ramalepe, SP. et al..pdf (1.152Mb)
    Date
    2025-01-25
    Author
    Ramalepe, Simon Phetole
    Modipa, Thipe I.
    Marelie, H. Davel
    Metadata
    Show full item record
    Abstract
    Due to the scarcity of data in low-resourced languages, the development of language models for these languages has been very slow. Currently, pre-trained language models have gained popularity in natural language processing, especially, in developing domain-specific models for low-resourced languages. In this study, we experiment with the impact of using occlusion-based techniques when training a language model for a text generation task. We curate 2 new datasets, the Sepedi monolingual (SepMono) dataset from several South African resources and the Sepedi radio news (SepNews) dataset from the radio news domain. We use the SepMono dataset to pre-train transformer-based models using the occlusion and non-occlusion pre-training techniques and compare performance. The SepNews dataset is specifically used for fine-tuning. Our results show that the non-occlusion models perform better compared to the occlusion-based models when measuring validation loss and perplexity. However, analysis of the generated text using the BLEU score metric, which measures the quality of the generated text, shows a slightly higher BLEU score for the occlusion-based models compared to the nonocclusion models.
    URI
    http://hdl.handle.net/10394/42870
    Collections
    • Faculty of Natural and Agricultural Sciences [4855]

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of NWU-IR Communities & CollectionsBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis TypeThis CollectionBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis Type

    My Account

    LoginRegister

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV