Tag Suggestion and Localization in User-Generated Videos Based on Social Knowledge

Ballan, Lamberto; Bertini, M.; Del Bimbo, A.; Meoni, M.; Serra, G.

doi:10.1145/1878151.1878155

Nowadays, almost any web site that provides means for sharing user-generated multimedia content, like Flickr, Facebook, YouTube and Vimeo, has tagging functionalities to let users annotate the material that they want to share. The tags are then used to retrieve the uploaded content, and to ease browsing and exploration of these collections, e.g. using tag clouds. However, while tagging a single image is straightforward, and sites like Flickr and Facebook allow also to tag easily portions of the uploaded photos, tagging a video sequence is more cumbersome, so that users just tend to tag the overall content of a video. Moreover, the tagging process is completely manual, and often users tend to spend as few time as possible to annotate the material, resulting in a sparse annotation of the visual content. A semi-automatic process, that helps the users to tag a video sequence would improve the quality of annotations and thus the overall user experience. While research on image tagging has received a considerable attention in the latest years, there are still very few works that address the problem of automatically assigning tags to videos, locating them temporally within the video sequence. In this paper we present a system for video tag suggestion and temporal localization based on collective knowledge and visual similarity of frames. The algorithm suggests new tags that can be associated to a given keyframe exploiting the tags associated to videos and images uploaded to social sites like YouTube and Flickr and visual features.