Abstract:
This paper analyzes the limitations upon the amount of indomain
(NIST SREs) data required for training a probabilistic
linear discriminant analysis (PLDA) speaker verification system
based on out-domain (Switchboard) total variability subspaces.
By limiting the number of speakers, the number of sessions per
speaker and the length of active speech per session available in
the target domain for PLDA training, we investigated the relative
effect of these three parameters on PLDA speaker verification
performance in the NIST 2008 and NIST 2010 speaker
recognition evaluation datasets. Experimental results indicate
that while these parameters depend highly on each other, to
beat out-domain PLDA training, more than 10 seconds of active
speech should be available for at least 4 sessions/speaker
for a minimum of 800 speakers. If further data is available,
considerable improvement can be made over solely out-domain
PLDA training.