dc.contributor.author | Ramalepe, Simon | |
dc.contributor.author | Modipa, Thipe I | |
dc.contributor.author | Davel, Marelie H | |
dc.date.accessioned | 2023-06-17T19:12:51Z | |
dc.date.available | 2023-06-17T19:12:51Z | |
dc.date.issued | 2022 | |
dc.identifier.citation | Ramalepe, SM et.al.2022.The Analysis of the Sepedi-English Code-switched Radio News Corpus | en_US |
dc.identifier.uri | http://hdl.handle.net/10394/41783 | |
dc.description.abstract | Code-switching is a phenomenon that occurs
mostly in multilingual countries where multilingual
speakers often switch between languages in
their conversations. The unavailability of largescale
code-switched corpora hampers the development
and training of language models for the generation
of code-switched text. In this study, we
explore the initial phase of collecting and creating
Sepedi-English code-switched corpus for generating
synthetic news. Radio news and the frequency
of code-switching on read news were considered
and analysed. We developed and trained a
Transformer-based language model using the collected
code-switched dataset. We observed that the
frequency of code-switched data in the dataset was
very lowat 1.1%.We complemented our dataset with
the news headlines dataset to create a new dataset.
Although the frequencywas still low, the model obtained
the optimal loss rate of 2,361 with an accuracy
of 66%. | en_US |
dc.language.iso | en | en_US |
dc.publisher | UP Jornals | en_US |
dc.subject | Code-switching | en_US |
dc.subject | text generation | en_US |
dc.subject | radio news | en_US |
dc.subject | Transformers | en_US |
dc.subject | Sepedi | en_US |
dc.title | The Analysis of the Sepedi-English Code-switched Radio News Corpus | en_US |
dc.type | Article | en_US |