A Study of X-vector Based Speaker Recognition on Short Utterances

Ahilan, K.; Sridharan, S.; Sriram, G.; Prachi, S.; Fookes, C.

Please use this identifier to cite or link to this item: http://repo.lib.jfn.ac.lk/ujrr/handle/123456789/1891

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ahilan, K.
dc.contributor.author	Sridharan, S.
dc.contributor.author	Sriram, G.
dc.contributor.author	Prachi, S.
dc.contributor.author	Fookes, C.
dc.date.accessioned	2021-03-15T08:14:14Z
dc.date.accessioned	2022-06-27T10:02:20Z	-
dc.date.available	2021-03-15T08:14:14Z
dc.date.available	2022-06-27T10:02:20Z	-
dc.date.issued	2019
dc.identifier.citation	Kanagasundaram, A., Sridharan, S., Ganapathy, S., Singh, P., & Fookes, C. (2019, January). A study of x-vector based speaker recognition on short utterances. In Proceedings of the 20th Annual Conference of the International Speech Communication Association, INTERSPEECH 2019. Vol. 2019-September. (pp. 2943-2947). ISCA (International Speech Communication Association).	en_US
dc.identifier.uri	http://repo.lib.jfn.ac.lk/ujrr/handle/123456789/1891	-
dc.description.abstract	The aim of this work is to gain insights into how the deep neural network (DNN) models should be trained for short utterance evaluation conditions in an x-vector based speaker verification system. The study suggests that the speaker embedding can be extracted with reduced dimensions for short utterance evaluation conditions. When the speaker embedding is extracted from deeper layer which has lower dimension, the x-vector system achieves 14% relative improvement over baseline approach on EER on NIST2010 5sec-5sec truncated conditions. We surmise that since short utterances have less phonetic information speaker discriminative x-vectors can be extracted from a deeper layer of the DNN which captures less phonetic information. Another interesting finding is that the x-vector system achieves 5% relative improvement on NIST2010 5sec-5sec evaluation condition when the back-end PLDA is trained using short utterance development data. The results confirms the intuitive expectation that duration of development utterances and the duration of evaluation utterances should be matched. Finally, for the duration mismatch condition, we propose a variance normalization approach for PLDA training that provides a 4% relative improvement on EER over baseline approach.	en_US
dc.language.iso	en	en_US
dc.subject	Speaker verification	en_US
dc.subject	PLDA	en_US
dc.title	A Study of X-vector Based Speaker Recognition on Short Utterances	en_US
dc.type	Article	en_US
Appears in Collections:	Electrical & Electronic Engineering

Files in This Item:

File	Description	Size	Format
A Study of X-vector Based Speaker Recognition on Short Utterances.pdf		85.92 kB	Adobe PDF	View/Open

Show simple item record