Abstract:
This paper studies the performance degradation
of Gaussian probabilistic linear discriminant analysis
(GPLDA) speaker verification system, when only
short-utterance data is used for speaker verification system
development. Subsequently, a number of techniques,
including utterance partitioning and source-normalised
weighted linear discriminant analysis (SN-WLDA) projections
are introduced to improve the speaker verification
performance in such conditions. Experimental studies
have found that when short utterance data is available for
speaker verification development, GPLDA system overall
achieves best performance with a lower number of universal
background model (UBM) components. As a lower
number of UBM components significantly reduces the
computational complexity of speaker verification system,
that is a useful observation. In limited session data conditions,
we propose a simple utterance-partitioning technique,
which when applied to the LDA-projected GPLDA system
shows over 8% relative improvement on EER values over baseline system on NIST 2008 truncated 10–10 s conditions.
We conjecture that this improvement arises from the
apparent increase in the number of sessions arising from
our partitioning technique and this helps to better model the
GPLDA parameters. Further, partitioning SN-WLDA-projected
GPLDA shows over 16% and 6% relative improvement
on EER values over LDA-projected GPLDA systems
respectively on NIST 2008 truncated 10–10 s interviewinterview,
and NIST 2010 truncated 10–10 s interviewinterview
and telephone-telephone conditions.