Abstract:
In this paper, the fusion of two speaker recognition subsystems,
one based on Frequency Modulation (FM) and another on
MFCC features, is reported. The motivation for their fusion was
to improve the recognition accuracy across different types of
channel variations, since the two features are believed to contain
complementary information. It was found that the MFCC-based
subsystem outperformed the FM-based subsystem on telephone
conversations from NIST SRE-06 dataset, while the opposite
was true for NIST SRE-08 telephone data. As a result, the FMbased
subsystem performed as well as the MFCC-based
subsystem and their fusion gave up to 23% relative
improvement in terms of EER over the MFCC subsystem alone,
when evaluated on the NIST 2008 core condition.