Computational syntactic analysis of Setswana

Berg, Anna Susanna

View/Open

Berg_AS_2018.pdf (1.558Mb)

Date

2018

Author

Berg, Anna Susanna

Metadata

Show full item record

Abstract

The main aim of this study is the computational syntactic analysis of the Setswana simple sentence, using Lexical Functional Grammar (LFG) as framework and XLE as the associated grammar development platform. LFG consists of several parallel levels of representation, but for syntactic analysis the focus is on constituent (c-) and functional (f-) structure as parallel mutually constraining levels of syntactic representation. We provide a detailed exposition of Setswana grammar in terms of word categories, phrases and the simple sentence, with specific emphasis on nominal classification and concordial agreement, as well as the verb as the morphologically most complex word category. We apply Lexical Mapping Theory (LMT), a sub-theory within LFG, to analyse the argument (a-) structure of the main verb, including the root and its extensions, in order to obtain the subcategorisation frames of the verb roots, as required in the XLE computational grammar lexicon. We also identify and analyse the immediate constituents of the simple sentence in terms of its phrasal structure and their grammatical functions. We use the rich XLE user interface to implement linguistic rules that model this grammar and constitute the XLE parser. We test the scope, coverage and accuracy of the parser with a systematically hand-crafted test suite that includes both grammatical and ungrammatical test items. We ensure alignment between the linguistic structure of the Setswana simple sentence and phrases and the test suite in order to demonstrate the correctness of our grammar. Finally, we create a treebank, annotated with deep syntactic information, using the XLE interface. The treebank is the first of its kind for Setswana and could serve as a gold standard for testing and evaluating future Setswana parsers. Both our test suite and the treebank, available in .lfg, .SExp and .pl (prolog) format, are freely available

URI

orcid.org/0000-0001-7596-4558
http://hdl.handle.net/10394/30634

Collections

Humanities [2671]