Abstract:
This paper proposes techniques to improve the performance of i-vector based speaker verification system when only
short utterances are available. Short-length utterance i-vectors vary with speaker, session variations, and the phonetic
content of the utterance. Well established methods such as linear discriminant analysis (LDA), source-normalized
LDA (SN-LDA) and within-class covariance normalisation (WCCN) exist for compensating the session variation but
we have identified the variability introduced by phonetic content due to utterance variation as an additional source of
degradation when short-duration utterances are used. To compensate for utterance variations in short i-vector based
speaker verification systems using cosine similarity scoring (CSS), we have introduced a short utterance variance normalization
(SUVN) technique and a short utterance variance (SUV) modelling approach at the i-vector feature level. A
combination of SUVN with LDA and SN-LDA is proposed to compensate the session and utterance variations and is
shown to provide improvement in performance over the traditional approach of using LDA and/or SN-LDA followed by
WCCN. An alternative approach is also introduced using the probabilistic linear discriminant analysis (PLDA) approach
to directly model the SUV. The combination of SUVN, LDA and SN-LDA followed by SUV PLDA modelling provides
an improvement over the baseline PLDA approach. We also show that for this combination of techniques, the utterance
variation information needs to be artificially added to full-length i-vectors for PLDA modelling.