Abstract:
The QUT-NOISE-SRE protocol is designed to mix the large
QUT-NOISE database, consisting of over 10 hours of background
noise, collected across 10 unique locations covering 5
common noise scenarios, with commonly used speaker recognition
datasets such as Switchboard, Mixer and the speaker recognition
evaluation (SRE) datasets provided by NIST. By allowing
common, clean, speech corpora to be mixed with a wide variety
of noise conditions, environmental reverberant responses,
and signal-to-noise ratios, this protocol provides a solid basis
for the development, evaluation and benchmarking of robust
speaker recognition algorithms, and is freely available to download
alongside the QUT-NOISE database. In this work, we
use the QUT-NOISE-SRE protocol to evaluate a state-of-theart
PLDA i-vector speaker recognition system, demonstrating
the importance of designing voice-activity-detection front-ends
specifically for speaker recognition, rather than aiming for perfect
coherence with the true speech/non-speech boundaries.
Index Terms: noisy speaker verification, speech databases,
evaluation protocols