Show simple item record

dc.contributor.advisorHelberg, A S J
dc.contributor.advisorBosch, S
dc.contributor.authorMzamo, Lulamile
dc.date.accessioned2017-02-21T10:12:11Z
dc.date.available2017-02-21T10:12:11Z
dc.date.issued2015
dc.identifier.urihttp://hdl.handle.net/10394/20448
dc.descriptionMIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2016en_US
dc.description.abstractHuman language resources (HLR) and applications currently available in South Africa are of a very basic nature, with lemmatisation being one of the basic. South African languages, except for English are considered underdeveloped when it comes to HLRs. The work detailed in this thesis is the development of a lemmatiser for one such language, namely isiXhosa. The previous benchmark in isiXhosa lemmatisation, which achieved 79.28%, was a rule-based lemmatiser implemented for the development of isiXhosa lemmatisation data. That data was used in this study. IsiXhosa, one of the South African official languages belonging to the Bantu language family that are classified as "resource scarce languages", is the second largest language in South Africa with 8.1 million mother-tongue speakers, second only to isiZulu. IsiXhosa is closely related to languages such as isiZulu, Siswati and isiNdebele and the work done in it could easily be bootstrapped to these languages. A lexicalised probabilistic graphical lemmatiser, the IsiXhosa Graphical Lemmatiser (XGL), was investigated, designed, implemented and evaluated against two benchmark lemmatisers, the CST Lemmatiser and the LemmaGen lemmatiser. The investigation towards the XGL involved five objectives. The first objective was to establish good characteristics for an automatic lemmatiser for morphologically complex languages. This was achieved by reviewing existing research material on the lemmatisation of morphological complex languages. To establish the most appropriate lemmas for isiXhosa in the context of natural language processing, a study of the isiXhosa language morphology was done, and appropriate lemmas for each word category were identified. Exploring the training data answered the objective of establishing what good data features are for an isiXhosa lemmatiser. The objective of designing an isiXhosa lemmatisation model was realised through the implementation of XGL. The last objective, the evaluation of an isiXhosa lemmatisation model, was achieved through training and testing XGL, and comparing it to two benchmark lemmatisers, the CST Lemmatiser and the LemmaGen lemmatiser. The XGL lemmatiser achieved the highest accuracy compared to the selected benchmark lemmatiser, with an accuracy rate of 83.19%.en_US
dc.language.isoenen_US
dc.publisherNorth-West University (South Africa) , Potchefstroom Campusen_US
dc.subjectNatural Language Processingen_US
dc.subjectHuman Language Technologyen_US
dc.subjectMachine Learningen_US
dc.subjectLemmatisationen_US
dc.subjectIsiXhosaen_US
dc.titleEvaluation of the performance of a machine learning lemmatiser for isiXhosaen_US
dc.typeThesisen_US
dc.description.thesistypeMastersen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record