Show simple item record

dc.contributor.advisorBarnard, E.en_US
dc.contributor.authorHeyns, H.en_US
dc.date.accessioned2021-11-04T06:53:12Z
dc.date.available2021-11-04T06:53:12Z
dc.date.issued2021en_US
dc.identifier.urihttps://orcid.org/0000-0002-0802-5005en_US
dc.identifier.urihttp://hdl.handle.net/10394/37665
dc.descriptionMEng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus
dc.description.abstractWord embeddings are widely used in natural language processing tasks. Most work on word embeddings focuses on monolingual languages with large available datasets. For embeddings to be useful in a multilingual environment, as in South Africa, the training techniques have to be adjusted to cater for a) multiple languages, b) smaller datasets and c) the occurrence of code-switching. One of the biggest roadblocks is to obtain datasets that include examples of natural code-switching, since code switching is generally avoided in written material. A solution to this problem is to use speech recognised data. Embedding packages such as Word2Vec and GloVe have default hyper-parameter settings that are usually optimised for training on large datasets and evaluation on analogy tasks. When using embeddings for problems such as text classification in our multilingual environment, the hyper-parameters have to be optimised for the specific data and task. We investigate the importance of optimising relevant hyper-parameters for training word embeddings with speech recognised data, where code-switching occurs, and evaluate against the real-world problem of classifying radio and television recordings with code switching. In this dissertation we present findings on the application of word embeddings to recognised speech in a multilingual environment.
dc.language.isoenen_US
dc.publisherNorth-West University (South Africa)en_US
dc.subjectWord embeddings
dc.subjectspeech recognition
dc.subjectcode-switching
dc.subjectmultilingual environment
dc.titleEmbedding recognized speech in a multilingual environmenten_US
dc.typeThesisen_US
dc.description.thesistypeMastersen_US
dc.contributor.researchID21021287 - Barnard, Etienne (Supervisor)en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record