Abstract:
In typical x-vector based speaker recognition systems, standard linear
discriminant analysis (LDA) is used to transform the x-vector space with
the aim of maximizing the between-speaker discriminant information
while minimizing the within-speaker variability. For LDA, it is customary
to use all the available speakers in the speaker recognition development
dataset. In this study, we investigate if it would be more beneficial to
estimate the between-speaker discriminant information and the withinspeaker
variability using the most confusing samples and the most distant
samples (from the target speaker mean) respectively in the LDA based
channel compensation. The between-speaker variance is estimated using
a pairwise approach where the most confusing non-target speaker samples
are found based on the Euclidean distance between the speaker mean
and adjacent speaker’s samples. The within-speaker variance is estimated
using the mean of each speaker and the furthermost samples in the speaker
sessions. Experimental results demonstrate the proposed LDA approach
for an x-vector x-vector based speaker recognition system achieves over
17% relative improvement on EER over standard LDA based x-vector
speaker recognition systems on the NIST2010 corext-corext condition.