Automatic genre classification of English students' argumentative essays using support vector machines

Raaff, Sabrina

Automatic genre classification of English students' argumentative essays using support vector machines

Files

raaff_sabrina(1).pdf (23.8 MB)

Date

2007

Authors

Raaff, Sabrina

Researcher ID

10095519 - Van Rooy, Albertus Jacobus (Supervisor)
10215484 - Van Huyssteen, Gerhardus Beukes (Supervisor)

Supervisors

Van Rooy, A.J.
Van Huyssteen, Gerhard B

Publisher

North-West University

Abstract

Automatic text classification refers to the classification of texts according to topic. Similar to text classification is the automatic classification of texts based on stylistic aspect of texts, such as automatic genre classification, where texts are classified according to their genre. This is the classification task that concerns this research project. *The project seeks to examine the genre of the argumentative essay, in order to develop a genre classifier, using an automatic genre classification approach, which will categorise prototypical and non-prototypical argumentative essays of student writers, into 'good' or 'bad' examples of the genre (binary classification). It is intended that this classifier will allow a senior marker (for example, a lecturer) to give student essays classified 'good' (those that require less feedback and volume of expert correction) to junior markers (for example, teaching assistants). This would afford the senior marker time to pay more attention to essays of a 'poorer' quality. The corpus used for the research project is comprised of 346 argumentative essays drawn from a section of the British Academic Written English corpus and written by LI English students. The data are composed of counts of linguistic features extracted from the texts. Once these features were extracted from the texts they were used to create four data sets: a raw data set, composed of raw feature frequencies, a data set composed of the feature set normalised for text length, a data set composed of inverse document frequency counts, and a data set composed of a logarithmic transformation of the feature frequencies. Various classifiers were built making use of these four data sets, using a machine learning approach. In this way, a classifier is trained on previous examples, in order to predict the class of future examples. The project uses support vector machines in STATISTICAL implementation of support vector machines, the STATISTIC A Support Vector Machine module (Statsoft, 2006). Support vector machine learning is used because this technique has been shown to perform well in automatic genre classification studies and other classification tasks.

Description

Thesis (M.A. (Afrikaans and Dutch))--North-West University, Potchefstroom Campus, 2008.

Keywords

Automatic Genre Classification/Recognition/Analysis, Automatic Text Classification, Information/Text Retrieval, Corpus Linguistics, Corpora, Computational Linguistics, Automatic Annotation, Machine Learning, Natural Language Processing

URI

http://hdl.handle.net/10394/1810

Collections

Humanities

Full item page

Automatic genre classification of English students' argumentative essays using support vector machines

Files

Date

Authors

Researcher ID

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Record Identifier

Abstract

Sustainable Development Goals

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By