Search
Now showing items 1-10 of 53
Stride and translation invariance in CNNs
(Southern African Conference for Artificial Intelligence Research, 2020)
Convolutional Neural Networks have become the standard for image classification tasks, however, these architectures are not invariant to translations of the input image. This lack of invariance is attributed to the use of ...
Tracking translation invariance in CNNs
(Southern African Conference for Artificial Intelligence Research, 2020)
Although Convolutional Neural Networks (CNNs) are widely used, their translation invariance (ability to deal with translated inputs) is still subject to some controversy. We explore this question using translation-sensitivity ...
Synthetic triphones from trajectory-based feature distributions
(Pattern Recognition Association of South Africa and Mechatronics International Conference, 2015)
We experiment with a new method to create
synthetic models of rare and unseen triphones in order to supplement
limited automatic speech recognition (ASR) training
data. A trajectory model is used to characterise seen ...
Efficient harvesting of Internet audio for resource-scarce ASR
(Interspeech 2011, 2011)
Spoken recordings that have been transcribed for human reading
(e.g. as captions for audiovisual material, or to provide alternative
modes of access to recordings) are widely available in many
languages. Such recordings ...
The predictability of name pronunciation errors in four South African languages
(Pattern Recognition Association of South Africa and Mechatronics International Conference, 2011)
Personal names are often pronounced in very different ways depending on the language background of the speaker. We seek to determine whether some of these pronunciations 'errors' are systematic and if so, in which ways. ...
Language identification of individual words with joint sequence models
(Interspeech 2014, 2014)
Within a multilingual automatic speech recognition (ASR) system,
knowledge of the language of origin of unknown words
can improve pronunciation modelling accuracy. This is of particular
importance for ASR systems required ...
The NCHLT Speech Corpus of the South African languages
(Workshop Spoken Language Technologies for Under-resourced Languages (SLTU), 2014)
The NCHLT speech corpus contains wide-band speech from approximately
200 speakers per language, in each of the eleven
official languages of South Africa. We describe the design and
development processes that were ...
The spoken web search task at MediaEval 2011
(Acoustics, Speech and Signal Processing (ICASSP), 2012-03)
In this paper, we describe the “Spoken Web Search” Task, which was held as part of the 2011 MediaEval benchmark campaign. The purpose of this task was to perform audio search with audio input in five languages, with very ...
Code-switched English pronunciation modeling for Swahili spoken term detection
(Procedia Computer Science: Spoken Language Technology for Under-resourced Languages, 2016-05)
We investigate modeling strategies for English code-switched words as found in a Swahili spoken term detection system. Code switching, where speakers switch language in a conversation, occurs frequently in multilingual ...
Performance analysis of a multilingual directory enquiries application
(Pattern Recognition Association of South Africa and Mechatronics International Conference, 2014)
In a multilingual society such as South Africa, a
practical directory enquiries (DE) application should be able to
serve users from various language backgrounds with information
relating to names in various languages: ...