• Login
    View Item 
    •   NWU-IR Home
    • Research Output
    • Faculty of Humanities
    • View Item
    •   NWU-IR Home
    • Research Output
    • Faculty of Humanities
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Tswana finite state tokenisation

    Thumbnail
    Date
    2015
    Author
    Pretorius, Laurette
    Viljoen, Biffie
    Berg, Ansu
    Pretorius, Rigardt
    Metadata
    Show full item record
    Abstract
    Tswana, a Bantu language in the Sotho group, is characterised by an agglutinative morphology and a disjunctive orthography, which mainly affects the verb category. In particular, verbal prefixes are usually written disjunctively, while suffixes follow a conjunctive writing style. Therefore, Tswana tokenisation cannot be based solely on whitespace, as is the case in many alphabetic, segmented languages, including the conjunctively written Nguni group of South African Bantu languages. This paper shows how a combination of two finite state tokeniser transducers and a finite state morphological analyser are combined to solve the Tswana (verb) tokenisation problem. The approach has the important advantage of bringing the processing of Tswana, beyond the morphological analysis level, in line with what is appropriate for the Nguni languages. This means that the challenge of the disjunctive orthography is met at the tokenisation/morphological analysis level and does not in principle propagate to subsequent levels of analysis such as POS tagging and shallow parsing, etc. The tokenisation approach is novel and, when implemented and evaluated, yields an F$_1$-score of 95 % with respect to a hand tokenised gold standard.
    URI
    http://hdl.handle.net/10394/20600
    http://dx.doi.org/10.1007/s10579-014-9292-1
    Collections
    • Faculty of Humanities [2042]

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of NWU-IR Communities & CollectionsBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis TypeThis CollectionBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis Type

    My Account

    LoginRegister

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV