Show simple item record

dc.contributor.advisorPilon, S.
dc.contributor.advisorRoux, J.C.
dc.contributor.authorGriesel, Marissaen_US
dc.date.accessioned2012-02-17T08:37:39Z
dc.date.available2012-02-17T08:37:39Z
dc.date.issued2011en_US
dc.identifier.urihttp://hdl.handle.net/10394/5565
dc.descriptionThesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
dc.description.abstractStatistic machine translation to any of the resource scarce South African languages generally results in low quality output. Large amounts of training data are required to generate output of such a standard that it can ease the work of human translators when incorporated into a translation environment. Sufficiently large corpora often do not exist and other techniques must be researched to improve the quality of the output. One of the methods in international literature that yielded good improvements in the quality of the output applies syntactic reordering as pre-processing. This pre-processing aims at simplifying the decoding process as less changes will need to be made during translation in this stage. Training will also benefit since the automatic word alignments can be drawn more easily because the word orders in both the source and target languages are more similar. The pre-processing is applied to the source language training data as well as to the text that is to be translated. It is in the form of rules that recognise patterns in the tags and adapt the structure accordingly. These tags are assigned to the source language side of the aligned parallel corpus with a syntactic analyser. In this research project, the technique is adapted for translation from English to Afrikaans and deals with the reordering of verbs, modals, the past tense construct, constructions with “to” and negation. The goal of these rules is to change the English (source language) structure to better resemble the Afrikaans (target language) structure. A thorough analysis of the output of the base-line system serves as the starting point. The errors that occur in the output are divided into categories and each of the underlying constructs for English and Afrikaans are examined. This analysis of the output and the literature on syntax for the two languages are combined to formulate the linguistically motivated rules. The module that performs the pre-processing is evaluated in terms of the precision and the recall, and these two measures are then combined in the F-score that gives one number by which the module can be assessed. All three of these measures compare well to international standards. Furthermore, a comparison is made between the system that is enriched by the pre-processing module and a baseline system on which no extra processing is applied. This comparison is done by automatically calculating two metrics (BLEU and NIST scores) and it shows very positive results. When evaluating the entire document, an increase in the BLEU score from 0,4968 to 0,5741 (7,7 %) and in the NIST score from 8,4515 to 9,4905 (10,4 %) is reported.en_US
dc.publisherNorth-West University
dc.subjectStatistiese masjienvertalingen_US
dc.subjectAfrikaansen_US
dc.subjectEngelsen_US
dc.subjectSintaktiese herrangskikkingen_US
dc.subjectVoorprosesseringen_US
dc.subjectStatistical machine translationen_US
dc.subjectEnglishen_US
dc.subjectSyntactic reorderingen_US
dc.subjectPre-processingen_US
dc.titleSintaktiese herrangskikking as voorprosessering in die ontwikkeling van Engels na Afrikaanse statistiese masjienvertaalsisteemafr
dc.typeThesisen_US
dc.description.thesistypeMastersen_US
dc.contributor.researchID11088478 - Roux, Justus Christiaan (Supervisor)


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record