Abstract:
Clustering identities in a video is a useful task to aid in video
search, annotation and retrieval, and cast identification. However,
reliably clustering faces across multiple videos is challenging task
due to variations in the appearance of the faces, as videos are
captured in an uncontrolled environment. A person’s appearance
may vary due to session variations including: lighting and
background changes, occlusions, changes in expression and make
up.
In this paper we propose the novel Local Total Variability
Modelling (Local TVM) approach to cluster faces across a news
video corpus; and incorporate this into a novel two stage video
clustering system. We first cluster faces within a single video
using colour, spatial and temporal cues; after which we use
face track modelling and hierarchical agglomerative clustering
to cluster faces across the entire corpus. We compare different
face recognition approaches within this framework. Experiments
on a news video database show that the Local TVM technique is
able effectively model the session variation observed in the data,
resulting in improved clustering performance, with much greater
computational efficiency than other methods.