Sintaktiese herrangskikking as voorprosessering in die ontwikkeling van  Engels na Afrikaanse statistiese masjienvertaalsisteem

Griesel, Marissa

dc.contributor.advisor	Pilon, S.
dc.contributor.advisor	Roux, J.C.
dc.contributor.author	Griesel, Marissa	en_US
dc.date.accessioned	2012-02-17T08:37:39Z
dc.date.available	2012-02-17T08:37:39Z
dc.date.issued	2011	en_US
dc.identifier.uri	http://hdl.handle.net/10394/5565
dc.description	Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
dc.description.abstract	Statistic machine translation to any of the resource scarce South African languages generally results in low quality output. Large amounts of training data are required to generate output of such a standard that it can ease the work of human translators when incorporated into a translation environment. Sufficiently large corpora often do not exist and other techniques must be researched to improve the quality of the output. One of the methods in international literature that yielded good improvements in the quality of the output applies syntactic reordering as pre-processing. This pre-processing aims at simplifying the decoding process as less changes will need to be made during translation in this stage. Training will also benefit since the automatic word alignments can be drawn more easily because the word orders in both the source and target languages are more similar. The pre-processing is applied to the source language training data as well as to the text that is to be translated. It is in the form of rules that recognise patterns in the tags and adapt the structure accordingly. These tags are assigned to the source language side of the aligned parallel corpus with a syntactic analyser. In this research project, the technique is adapted for translation from English to Afrikaans and deals with the reordering of verbs, modals, the past tense construct, constructions with “to” and negation. The goal of these rules is to change the English (source language) structure to better resemble the Afrikaans (target language) structure. A thorough analysis of the output of the base-line system serves as the starting point. The errors that occur in the output are divided into categories and each of the underlying constructs for English and Afrikaans are examined. This analysis of the output and the literature on syntax for the two languages are combined to formulate the linguistically motivated rules. The module that performs the pre-processing is evaluated in terms of the precision and the recall, and these two measures are then combined in the F-score that gives one number by which the module can be assessed. All three of these measures compare well to international standards. Furthermore, a comparison is made between the system that is enriched by the pre-processing module and a baseline system on which no extra processing is applied. This comparison is done by automatically calculating two metrics (BLEU and NIST scores) and it shows very positive results. When evaluating the entire document, an increase in the BLEU score from 0,4968 to 0,5741 (7,7 %) and in the NIST score from 8,4515 to 9,4905 (10,4 %) is reported.	en_US
dc.publisher	North-West University
dc.subject	Statistiese masjienvertaling	en_US
dc.subject	Afrikaans	en_US
dc.subject	Engels	en_US
dc.subject	Sintaktiese herrangskikking	en_US
dc.subject	Voorprosessering	en_US
dc.subject	Statistical machine translation	en_US
dc.subject	English	en_US
dc.subject	Syntactic reordering	en_US
dc.subject	Pre-processing	en_US
dc.title	Sintaktiese herrangskikking as voorprosessering in die ontwikkeling van Engels na Afrikaanse statistiese masjienvertaalsisteem	afr
dc.type	Thesis	en_US
dc.description.thesistype	Masters	en_US
dc.contributor.researchID	11088478 - Roux, Justus Christiaan (Supervisor)

Files in this item

Name:: Griesel_M.pdf
Size:: 1.485Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Humanities [2696]

Show simple item record