Search

Now showing items 1-10 of 39

The Spoken Web Search task at Mediaeval 2012

Metze, Florian; Xavier, Anguera; Barnard, Etienne; Gravier, Guillaume; Davel, Marelie H. (Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013)

In this paper, we describe the “Spoken Web Search” Task, which was held as part of the 2012 MediaEval benchmark evaluation campaign. The purpose of this task was to perform audio search with audio input in four languages, ...

Towards lecture transcription in resource-scarce environments

De Villiers, Pieter; Jooste, Petri; Van Heerden, Carel J.; Barnard, Etienne (Pattern recognition association of South Africa (PRASA), 2012)

We present progress towards automated Lecture Transcription (LT) in resource scarce environments. Our development has focused on the transcription of lectures in Afrikaans from two faculties at North-West University. A ...

The NCHLT Speech Corpus of the South African languages

Barnard, Etienne; Davel, Marelie H.; van Heerden, Charl; De Wet, Febe; Badenhorst, Jaco (Workshop Spoken Language Technologies for Under-resourced Languages (SLTU), 2014)

The NCHLT speech corpus contains wide-band speech from approximately 200 speakers per language, in each of the eleven official languages of South Africa. We describe the design and development processes that were ...

Language Independent Search in MediaEval's Spoken Web Search Task

Metze, Florian; Anguera, Xavier; Barnard, Etienne; Gravier, Guillaume; Davel, Marelie H. (Elsevier Ltd., 2014)

In this paper, we describe several approaches to language-independent spoken term detection and compare their performanceon a common task, namely “Spoken Web Search”. The goal of this part of the MediaEval initiative is ...

A Discourse Model of Affect for Text-to-Speech Synthesis

Schlunz, Georg I.; Barnard, Etienne (Pattern Recognition Association of South Africa and Mechatronics International Conference, 2013)

This paper introduces a model of affect to improve prosody in text-to-speech synthesis. It operates on the discourse level of text to predict the underlying linguistic factors that contribute towards emotional appraisal, ...

G2P variant prediction techniques for ASR and STD

Davel, Marelie H.; van Heerden, Charl; Barnard, Etienne (Interspeech 2013, 2013)

Introducing pronunciation variants into a lexicon is a balancing act: incorporating necessary variants can improve automatic speech recognition (ASR) and spoken term detection (STD) performance by capturing some of the ...

The semi-automated creation of stratified speech corpora

Van Heerden, Carel; Barnard, Etienne; Davel, Marelie H. (Pattern recognition association of South Africa (PRASA), 2013)

Smartphones provide an efficient means for the collection of speech data; however, the quality of the corpora created in this fashion is not predictable. We describe an approach that allows us to post-process and rank ...

Classifying recognised speech with deep neural networks

Strydom, Rhyno A; Barnard, Etienne (Southern African Conference for Artificial Intelligence Research, 2020)

We investigate whether word embeddings using deep neural networks can assist in the analysis of text produced by a speechrecognition system. In particular, we develop algorithms to identify which words are incorrectly ...

Optimising word embeddings for recognised multilingual speech

Barnard, Etienne; Heyns, Nuette (Southern African Conference for Artificial Intelligence Research, 2020)

Word embeddings are widely used in natural language processing (NLP) tasks. Most work on word embeddings focuses on monolingual languages with large available datasets. For embeddings to be useful in a multilingual ...

Benign interpolation of noise in deep learning

Davel, Marelie Hattingh; Barnard, Etienne; Theunissen, Marthinus Wilhelmus (South African Institute of Computer Scientists and Information Technologists, 2020)

The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, ...