Domain adaptation for speaker diarisation in low-resource environments
Van Wyk, Lucas
MetadataShow full item record
Speaker diarisation systems aim to answer the question \who spoke when?" and are useful in providing valuable metadata to downstream applications, such as automatic speech recognition systems. However, speaker diarisation systems, like most applications in the field of speech recognition, are especially challenged by domain-mismatch conditions. In this study, we investigate methods with which to adapt a pre-trained diarisation system to a new target domain when only a small in-domain corpus is available and retraining is therefore not an option. We also develop a method for fine-tuning the adaptation process of a pre-trained speaker diarisation system using cluster analysis. Our domain adaptation process focuses on retraining and adapting the statistical components in a speaker diarisation pipeline, which are inherently domain specific, to the target domain. Lastly, we demonstrate this domain adaptation process in a real-world scenario by adapting a pre-trained diarisation system using a small in-domain dataset consisting of telephonic speech from South African call centres. We show that the adapted system can be used to provide metadata which aids the performance of automatic speech recognition systems through speaker-specific adaptations.
- Engineering