Abstract:
This paper presents the LEAP System, developed for the Second
DIHARD diarization Challenge. The evaluation data in
the challenge is composed of multi-talker speech in restaurants,
doctor-patient conversations, child language acquisition
recordings in home environments and audio extracted YouTube
videos. The LEAP system is developed using two types of embeddings,
one based on i-vector representations and the other
one based on x-vector representations. The initial diarization
output obtained using agglomerative hierarchical clustering
(AHC) done on the probabilistic linear discriminant analysis
(PLDA) scores is refined using the Variational-Bayes hidden
Markov model (VB-HMM) model. We propose a modified VBHMM
model with posterior scaling which provides significant
improvements in the final diarization error rate (DER). We also
use a domain compensation on the i-vector features to reduce
the mis-match between training and evaluation conditions. Using
the proposed approaches, we obtain relative improvements
in DER of about 7:1% relative for the best individual system
over the DIHARD baseline system and about 13:7% relative
for the final system combination on evaluation set. An analysis
performed using the proposed posterior scaling method shows
that scaling results in improved discrimination among theHMM
states in the VB-HMM.