NWU Institutional Repository

Classifying recognised speech with deep neural networks

dc.contributor.advisorBarnard, E.en_US
dc.contributor.authorStrydom, R.A.en_US
dc.contributor.researchID21021287 - Barnard, Etienne (Supervisor)en_US
dc.date.accessioned2021-11-09T14:09:00Z
dc.date.available2021-11-09T14:09:00Z
dc.date.issued2021en_US
dc.descriptionMEng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus
dc.description.abstractWe investigate whether word embeddings using deep neural networks can assist in the analysis of text produced by a speech-recognition system. In particular, we develop algorithms to identify which words are incorrectly detected by a speech-recognition system in broadcast news. The multilingual corpus used in this investigation con-tains speech from the eleven official South African languages, as well as Hindi. Pop-ular word embedding algorithms such as word2vec and fastText are investigated and compared with context-specific embedding representations such as doc2vec and non-context specific statistical sentence embedding methods such as term frequency-inverse document frequency (TF-IDF), which is used as our baseline method. These various embedding methods are then used as fixed length input representations for a logistic regression and feedforward neural network classifier. The output is used as an addi-tional categorical input feature to a CatBoost classifier to determine whether the words were correctly recognised. Other methods are also investigated, including a method that uses the word embedding itself and cosine similarity between specific keywords to identify whether a specific keyword was correctly detected. When relying only on the speech-text data, the best result was obtained using the TF-IDF document embed-dings as input features to a feedforward neural network. Adding the output from the feedforward neural network as an additional feature to the CatBoost classifier did not enhance the classifier’s performance compared to using the non-textual information provided, although adding the output from a weaker classifier was somewhat beneficial.
dc.description.thesistypeMastersen_US
dc.identifier.urihttps://orcid.org/0000-0002-4364-2148en_US
dc.identifier.urihttp://hdl.handle.net/10394/37757
dc.language.isoenen_US
dc.publisherNorth-West University (South Africa)en_US
dc.subjectword embeddings
dc.subjectword2vec
dc.subjectfastText
dc.subjectdoc2vec
dc.subjectTF-IDF
dc.subjectDeep Neural Networks
dc.subjectCatBoost
dc.titleClassifying recognised speech with deep neural networksen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Strydom_RA.pdf
Size:
6.4 MB
Format:
Adobe Portable Document Format
Description:

Collections