Abstract:
Frequency modulation (FM) features are typically extracted using a filterbank, usually based on an auditory frequency scale, however there is psychophysical evidence to suggest that this scale may not be optimal for extracting speaker-specific information. In this paper, speaker-specific information in FM features is analyzed as a function of the filterbank structure at the feature, model and classification stages. Scatter matrix based separation measures at the feature level and Kullback-Leibler distance based measures at the model level are used to analyze the discriminative contributions of the different bands. Then a series of speaker recognition experiments are performed to study how each band of the FM feature contributes to speaker recognition. A new filter bank structure is proposed that attempts to maximize the speaker-specific information in the FM feature for telephone data. Finally, the distribution of speaker-specific information is analyzed for wideband speech.