Abstract:
Recently, subband frame-averaged frequency modulation (FM) as a
complementary feature to amplitude-based features for several
speech based classification problems including speaker recognition
has shown promise. One problem with using FM extraction in practical
implementations is computational complexity. Proposed is a computationally
efficient method to estimate the frame-averaged FM component
in a novel manner, using zero crossing counts and the zero
crossing counts of the differentiated signal. FM components, extracted
from subband speech signals using the proposed method, form a
feature vector. Speaker recognition experiments conducted on the
NIST 2008 telephone database show that the proposed method successfully
augments mel frequency cepstrum coefficients (MFCCs) to
improve performance, obtaining 17% relative reductions in equal
error rates when compared with an MFCC-based system.