Digital Video Fingerprinting


A. Sarkar, B.S. Manjunath

Problem Motivation

At present, there is extensive online piracy and it is a huge problem for the entertainment industry, e.g. video entertainment companies, movie production houses. Often, as soon as a movie is released, pirated copies of the movies get uploaded (as a collection of small parts) in online repositories like YouTube. Of course, right now, YouTube does take the videos off as soon as a complaint is lodged stating the copyright infringement. If we have a fully working system (that scales to indeed a very large scale database) then when an user is uploading a new video, its fingerprints can be extracted online and the signature comparison can be done (very fast) and the system will be able to explicitly state whether the video is a copy/duplicate created using the frames of an already existing video. This will have a huge effect on the piracy industry – video pirates can no longer upload their content online if that website does use this duplicate detection software. Thus, instead of wasting time and money to detect when such pirated videos are being released on the web (there are so many videos to keep track of) and then request that particular website to take the video off, our system will nip the problem in the bud – by disallowing the illegal upload in the first place.


An End-to-end System for Efficient and Robust Detection of Duplicate Videos

We have developed a fast and accurate method for detection of duplicate videos in a large database, using very compact yet sufficiently discriminative signatures called video “fingerprints”.  The Color Layout Descriptor (CLD) is chosen to create the fingerprints, as it gives better performance over our experimental dataset (and over the range of noise attacks that we have considered) than other features. The fingerprints consist of CLD feature vectors computed over the video keyframes, which are obtained as the k-means cluster centers in that feature vector space.

The algorithm is tested on a database of 38000 videos, worth 1600 hours of content. For individual queries with an average duration of 60 sec (about 50% of the average database video length), the corresponding duplicate video is retrieved in 0.03 sec, on Intel Xeon with CPU 2.33GHz, with a very high accuracy of 97%.  After finding the best matched database video for a given query, the system has a duplicate/non-duplicate detection module where it is decided whether the query video is indeed a duplicate of the best matched video. This module uses one of two methods – the first is based on a distance threshold, based on the distance from a query signature to the best matched database signature, while the second is based on the fraction of query keyframes that can be registered with the best matched database video keyframes. On presenting a new dataset of 1700 videos (worth 75 hours of content), none of which has a corresponding duplicate in the database, only 3% of these were wrongly classified as being a duplicate of a database video.


Prospective Applications

Task performed online:

A company can use this duplication detection software to detect copyright infringements when a copyrighted video is uploaded by another user – who claims the video to be his own creation even though it is constructed wholly from an already existing copyrighted video.

Task performed offline:

The method can also be used to detect duplicates (already existing database videos that do involve copyright violations) in an existing database. A company can take an actual copyright content as a query and try to find all the videos in the database that are duplicates of it. All the videos fetched can then be verified for copyright infringement.



This research is supported by a grant from ONR #N00014-05-1-0816. Program manager: Dr. Ralph Wachter



  1. Anindya Sarkar, Vishwakarma Singh, Pratim Ghosh, B. S. Manjunath, Ambuj Singh,
    "Efficient and Robust Detection of Duplicate Videos in a Large Database"
    IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 6, pp. 870-885, Jun. 2010.
    [abstract] [PDF] [BibTex]

    Abstract preview: "We present an efficient and accurate method for duplicate video detection in a large database using video fingerprints. We have empirically chosen the Color Layout Descriptor, a compact and robust fra..." [more]

  2. 2008

  3. Anindya Sarkar, Vishwakarma Singh, Pratim Ghosh, B. S. Manjunath and Ambuj Singh,
    "Discussion of a Pruning Scheme for Top-K Retrievals Among Vector Quantizer Encoded Signatures"
    Technical Report, VRL, ECE, University of California, Santa Barbara, Apr. 2008.
    [abstract] [PDF] [BibTex]

    Abstract preview: "The problem we are considering here is duplicate video detection. We have a database of N videos and we store compact signatures, called fingerprints, for each of them. When a query video is presented..." [more]

  4. Anindya Sarkar, Pratim Ghosh, Emily Moxley and B. S. Manjunath,
    "Video Fingerprinting: Features for Duplicate and Similar Video Detection and Query-based Video Retrieval"
    Proc. SPIE - Multimedia Content Access: Algorithms and Systems II, San Jose, California, Jan. 2008.
    [abstract] [PDF] [BibTex]

    Abstract preview: "A video "fingerprint" is a feature extracted from the video that should represent the video compactly, allowing faster search without compromising the retrieval accuracy. Here, we use a keyframe set t..." [more]