Outeurskapidentifikasie en mikroblogs: 'n eksploratiewe forensiese analise in 'n digitale era
MetadataShow full item record
Currently, we find ourselves in the era of Web 3.0 (Semantic Web) and Web 4.0 (Symbiotic Web). Not only are people now able to share content with each other, but they can also create the content themselves. The mobile nature of Web 3.0 and 4.0, that is, the manner in which users have access to the internet via various devices (users are no longer solely dependent on a computer to use the internet), has resulted in a change in social interaction and communication. People are moving away from the more traditional communication mediums and they are finding news, information, and companionship online. It, therefore, appears to be evident that the role of the user has changed and should now be viewed as an online profile through which newsworthy occurrences can be shared in no time. Communication now also has to be short, the reader wants to know what is happening at a glance. The development of these new technologies is consequently not only changing the ways in which communication takes place or the ways in which language is used to convey the message, but also creates an ideal opportunity for negative communication (slander and bullying or trolling) and fake profiles. In view of the increasing amount of anonymous profiles, one has to ask who is truly speaking and can the author of a short electronic text (in Afrikaans) be identified? Against the broad background of the digital era, this explorative forensic analysis investigated the possibility of identifying the authors of microblog entries (on Twitter) while also attempting to lay bare the characteristics of the Afrikaans language found on these social media platforms. In order to conduct the above-mentioned research, a theoretical overview was conducted in which key concepts such as language, forensic linguistics and corpus linguistics were investigated. Also presented, were the existing methods used to identify the authors of short texts from the internet (in other languages such as English) and the changes that occur in language for it to successfully communicate on these social media platforms. Seeing that authorship identification is carried out by means of a corpus analysis, reference and specialised corpora were compiled. Furthermore, three authors from the specialised corpora were chosen and an extra corpus was compiled from each. The largest -suspicious text? consisted of 91 Tweets, or 1 409 words, while the smallest -suspicious text? consisted of 32 Tweets or 412 words. The size of these corpora is therefore considerably smaller than the corpora used in authorship identification in Afrikaans thus far. These suspicious texts were used to test the presented method of authorship identification regarding the authors of Afrikaans microblog entries. After the theoretical overview as well as the compiling and processing of the data, the empirical analysis was done. The method presented for the identification of the author of an Afrikaans microblog entry includes stylometric, stylistic, and text analyses. Thirteen aspects that can be used in the process of authorship identification were identified. These aspects include determining keyness, ratio analyses along with the analysis of sentences, words, and characters per Tweet, language relationships, n-gram analyses, readability tests, common features (Twitter-specific features - hashtags, mentions of users and hyperlinks - punctuation and capital letters), syntactic features, morphological features, semantic features, interjections, curse words, emoticons, and an error analysis. The stylometric, stylistic, and text analyses indicated that similar traits between authors can still be identified despite the limited size of the suspicious texts. It has been determined that the smallest amount of similarities occur between the third suspicious text and the real author of the text. In this case, in 9 of the 13 aspects analysed, the real author could be identified without reasonable doubt, a result that may be considered successful. It has further been determined that Afrikaans is indeed adjusted by the various users to reach their communicative goals and that certain distinctive language features are identifiable in the Afrikaans used in microblogs. Finally, this study proposes a process that can be used to identify the authors of microblogs.
- Humanities