The Analysis of the Sepedi-English Code-switched Radio News Corpus

Ramalepe, Simon; Modipa, Thipe I; Davel, Marelie H

dc.contributor.author	Ramalepe, Simon
dc.contributor.author	Modipa, Thipe I
dc.contributor.author	Davel, Marelie H
dc.date.accessioned	2023-06-17T19:12:51Z
dc.date.available	2023-06-17T19:12:51Z
dc.date.issued	2022
dc.identifier.citation	Ramalepe, SM et.al.2022.The Analysis of the Sepedi-English Code-switched Radio News Corpus	en_US
dc.identifier.uri	http://hdl.handle.net/10394/41783
dc.description.abstract	Code-switching is a phenomenon that occurs mostly in multilingual countries where multilingual speakers often switch between languages in their conversations. The unavailability of largescale code-switched corpora hampers the development and training of language models for the generation of code-switched text. In this study, we explore the initial phase of collecting and creating Sepedi-English code-switched corpus for generating synthetic news. Radio news and the frequency of code-switching on read news were considered and analysed. We developed and trained a Transformer-based language model using the collected code-switched dataset. We observed that the frequency of code-switched data in the dataset was very lowat 1.1%.We complemented our dataset with the news headlines dataset to create a new dataset. Although the frequencywas still low, the model obtained the optimal loss rate of 2,361 with an accuracy of 66%.	en_US
dc.language.iso	en	en_US
dc.publisher	UP Jornals	en_US
dc.subject	Code-switching	en_US
dc.subject	text generation	en_US
dc.subject	radio news	en_US
dc.subject	Transformers	en_US
dc.subject	Sepedi	en_US
dc.title	The Analysis of the Sepedi-English Code-switched Radio News Corpus	en_US
dc.type	Article	en_US

Files in this item

Name:: Ramalepe, S. The Analysis of the ...
Size:: 874.3Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Faculty of Engineering [1136]

Show simple item record