Abstract:
Most conventional features used in speaker recognition are
based on spectral envelope characterizations such as Mel-scale
filterbank cepstrum coefficients (MFCC), Linear Prediction
Cepstrum Coefficient (LPCC) and Perceptual Linear Prediction
(PLP). The MFCC’s success has seen it become a de facto standard
feature for speaker recognition. Alternative features, that
convey information other than the average subband energy, have
been proposed, such as frequency modulation (FM) and subband
spectral centroid features. In this study, we investigate
the characterization of subband energy as a two dimensional
feature, comprising Spectral Centroid Magnitude (SCM) and
Spectral Centroid Frequency (SCF). Empirical experiments carried
out on the NIST 2001 and NIST 2006 databases using SCF,
SCM and their fusion suggests that the combination of SCM and
SCF are somewhat more accurate compared with conventional
MFCC, and that both fuse effectively with MFCCs. We also
show that frame-averaged FM features are essentially centroid
features, and provide an SCF implementation that improves on
the speaker recognition performance of both subband spectral
centroid and FM features.