Towards an unsupervised morphological segmenter for isiXhosa
Loading...
Date
Authors
Mzamo, Lulamile
Helberg, Albert
Bosch, Sonja
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
In this paper, branching entropy techniques and
isiXhosa language heuristics are adapted to develop unsupervised
morphological segmenters for isiXhosa. An overview of isiXhosa
segmentation issues is given, followed by a discussion on previous
work in automated segmentation, and segmentation of isiXhosa
in particular. Two unsupervised isiXhosa segmenters are
presented and compared to a random minimum baseline and
Morfessor-Baseline, a standard in unsupervised word
segmentation. Morfessor-Baseline outperforms both isiXhosa
segmenters at 79.10% boundary identification accuracy. The
IsiXhosa Branching Entropy Segmenter (XBES) performance
varies depending on the segmentation mode used, with a
maximum of 73.39%. The IsiXhosa Heuristic Maximum
Likelihood Segmenter (XHMLS) achieves 72.42%. The study
suggests that unsupervised isiXhosa morphological segmentation
is feasible with better optimization of the current attempt
Description
Citation
Mzamo, L. et al. 2019. Towards an unsupervised morphological segmenter for isiXhosa. Proceedings, 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA), Bloemfontein, South Africa, 28-30 Jan. Article no 8704816:166-170. [https://doi.org/10.1109/RoboMech.2019.8704816]