The Analysis of the Sepedi-English Code-switched Radio News Corpus
dc.contributor.author | Ramalepe, Simon | |
dc.contributor.author | Modipa, Thipe I | |
dc.contributor.author | Davel, Marelie H | |
dc.date.accessioned | 2023-06-17T19:12:51Z | |
dc.date.available | 2023-06-17T19:12:51Z | |
dc.date.issued | 2022 | |
dc.description.abstract | Code-switching is a phenomenon that occurs mostly in multilingual countries where multilingual speakers often switch between languages in their conversations. The unavailability of largescale code-switched corpora hampers the development and training of language models for the generation of code-switched text. In this study, we explore the initial phase of collecting and creating Sepedi-English code-switched corpus for generating synthetic news. Radio news and the frequency of code-switching on read news were considered and analysed. We developed and trained a Transformer-based language model using the collected code-switched dataset. We observed that the frequency of code-switched data in the dataset was very lowat 1.1%.We complemented our dataset with the news headlines dataset to create a new dataset. Although the frequencywas still low, the model obtained the optimal loss rate of 2,361 with an accuracy of 66%. | en_US |
dc.identifier.citation | Ramalepe, SM et.al.2022.The Analysis of the Sepedi-English Code-switched Radio News Corpus | en_US |
dc.identifier.uri | http://hdl.handle.net/10394/41783 | |
dc.language.iso | en | en_US |
dc.publisher | UP Jornals | en_US |
dc.subject | Code-switching | en_US |
dc.subject | text generation | en_US |
dc.subject | radio news | en_US |
dc.subject | Transformers | en_US |
dc.subject | Sepedi | en_US |
dc.title | The Analysis of the Sepedi-English Code-switched Radio News Corpus | en_US |
dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Ramalepe, S. The Analysis of the Sepedi-English.pdf
- Size:
- 874.33 KB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.61 KB
- Format:
- Item-specific license agreed upon to submission
- Description: