Classifying recognised speech with deep neural networks

Strydom, R.A.

Classifying recognised speech with deep neural networks

dc.contributor.advisor	Barnard, E.	en_US
dc.contributor.author	Strydom, R.A.	en_US
dc.contributor.researchID	21021287 - Barnard, Etienne (Supervisor)	en_US
dc.date.accessioned	2021-11-09T14:09:00Z
dc.date.available	2021-11-09T14:09:00Z
dc.date.issued	2021	en_US
dc.description	MEng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus
dc.description.abstract	We investigate whether word embeddings using deep neural networks can assist in the analysis of text produced by a speech-recognition system. In particular, we develop algorithms to identify which words are incorrectly detected by a speech-recognition system in broadcast news. The multilingual corpus used in this investigation con-tains speech from the eleven official South African languages, as well as Hindi. Pop-ular word embedding algorithms such as word2vec and fastText are investigated and compared with context-specific embedding representations such as doc2vec and non-context specific statistical sentence embedding methods such as term frequency-inverse document frequency (TF-IDF), which is used as our baseline method. These various embedding methods are then used as fixed length input representations for a logistic regression and feedforward neural network classifier. The output is used as an addi-tional categorical input feature to a CatBoost classifier to determine whether the words were correctly recognised. Other methods are also investigated, including a method that uses the word embedding itself and cosine similarity between specific keywords to identify whether a specific keyword was correctly detected. When relying only on the speech-text data, the best result was obtained using the TF-IDF document embed-dings as input features to a feedforward neural network. Adding the output from the feedforward neural network as an additional feature to the CatBoost classifier did not enhance the classifier’s performance compared to using the non-textual information provided, although adding the output from a weaker classifier was somewhat beneficial.
dc.description.thesistype	Masters	en_US
dc.identifier.uri	https://orcid.org/0000-0002-4364-2148	en_US
dc.identifier.uri	http://hdl.handle.net/10394/37757
dc.language.iso	en	en_US
dc.publisher	North-West University (South Africa)	en_US
dc.subject	word embeddings
dc.subject	word2vec
dc.subject	fastText
dc.subject	doc2vec
dc.subject	TF-IDF
dc.subject	Deep Neural Networks
dc.subject	CatBoost
dc.title	Classifying recognised speech with deep neural networks	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Strydom_RA.pdf
Size:: 6.4 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Engineering