Abstract:
Robust speaker verification on short utterances remains a key
consideration when deploying automatic speaker recognition,
as many real world applications often have access to only limited
duration speech data. This paper explores how the recent
technologies focused around total variability modeling behave
when training and testing utterance lengths are reduced. Results
are presented which provide a comparison of Joint Factor Analysis
(JFA) and i-vector based systems including various compensation
techniques; Within-Class Covariance Normalization
(WCCN), LDA, Scatter Difference Nuisance Attribute Projection
(SDNAP) and Gaussian Probabilistic Linear Discriminant
Analysis (GPLDA). Speaker verification performance for utterances
with as little as 2 sec of data taken from the NIST Speaker
Recognition Evaluations are presented to provide a clearer picture
of the current performance characteristics of these techniques
in short utterance conditions.