Classifying recognised speech with deep neural networks
dc.contributor.advisor | Barnard, E. | en_US |
dc.contributor.author | Strydom, R.A. | en_US |
dc.contributor.researchID | 21021287 - Barnard, Etienne (Supervisor) | en_US |
dc.date.accessioned | 2021-11-09T14:09:00Z | |
dc.date.available | 2021-11-09T14:09:00Z | |
dc.date.issued | 2021 | en_US |
dc.description | MEng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus | |
dc.description.abstract | We investigate whether word embeddings using deep neural networks can assist in the analysis of text produced by a speech-recognition system. In particular, we develop algorithms to identify which words are incorrectly detected by a speech-recognition system in broadcast news. The multilingual corpus used in this investigation con-tains speech from the eleven official South African languages, as well as Hindi. Pop-ular word embedding algorithms such as word2vec and fastText are investigated and compared with context-specific embedding representations such as doc2vec and non-context specific statistical sentence embedding methods such as term frequency-inverse document frequency (TF-IDF), which is used as our baseline method. These various embedding methods are then used as fixed length input representations for a logistic regression and feedforward neural network classifier. The output is used as an addi-tional categorical input feature to a CatBoost classifier to determine whether the words were correctly recognised. Other methods are also investigated, including a method that uses the word embedding itself and cosine similarity between specific keywords to identify whether a specific keyword was correctly detected. When relying only on the speech-text data, the best result was obtained using the TF-IDF document embed-dings as input features to a feedforward neural network. Adding the output from the feedforward neural network as an additional feature to the CatBoost classifier did not enhance the classifier’s performance compared to using the non-textual information provided, although adding the output from a weaker classifier was somewhat beneficial. | |
dc.description.thesistype | Masters | en_US |
dc.identifier.uri | https://orcid.org/0000-0002-4364-2148 | en_US |
dc.identifier.uri | http://hdl.handle.net/10394/37757 | |
dc.language.iso | en | en_US |
dc.publisher | North-West University (South Africa) | en_US |
dc.subject | word embeddings | |
dc.subject | word2vec | |
dc.subject | fastText | |
dc.subject | doc2vec | |
dc.subject | TF-IDF | |
dc.subject | Deep Neural Networks | |
dc.subject | CatBoost | |
dc.title | Classifying recognised speech with deep neural networks | en_US |
dc.type | Thesis | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Strydom_RA.pdf
- Size:
- 6.4 MB
- Format:
- Adobe Portable Document Format
- Description: