Abstract:
We investigate the use of deep neural networks (DNNs)
for the speaker diarization task to improve performance under
domain mismatched conditions. Three unsupervised domain
adaptation techniques, namely inter-dataset variability compensation
(IDVC), domain-invariant covariance normalization
(DICN), and domain mismatch modeling (DMM), are
applied on DNN based speaker embeddings to compensate
for the mismatch in the embedding subspace. We present
results conducted on the DIHARD data, which was released
for the 2018 diarization challenge. Collected from a diverse
set of domains, this data provides very challenging domain
mismatched conditions for the diarization task. Our results
provide insights into how the performance of our proposed
system could be further improved.