Domain adaptation for speaker diarisation in low-resource environments
Abstract
Speaker diarisation systems aim to answer the question \who spoke when?" and are useful
in providing valuable metadata to downstream applications, such as automatic speech
recognition systems. However, speaker diarisation systems, like most applications in the
field of speech recognition, are especially challenged by domain-mismatch conditions.
In this study, we investigate methods with which to adapt a pre-trained diarisation system
to a new target domain when only a small in-domain corpus is available and retraining is
therefore not an option. We also develop a method for fine-tuning the adaptation process
of a pre-trained speaker diarisation system using cluster analysis. Our domain adaptation
process focuses on retraining and adapting the statistical components in a speaker
diarisation pipeline, which are inherently domain specific, to the target domain. Lastly,
we demonstrate this domain adaptation process in a real-world scenario by adapting a
pre-trained diarisation system using a small in-domain dataset consisting of telephonic
speech from South African call centres. We show that the adapted system can be used
to provide metadata which aids the performance of automatic speech recognition systems
through speaker-specific adaptations.
Collections
- Engineering [1379]