Partial duplicate file

The best solution for finding and removing duplicate files.
oreille72
Posts: 1
Joined: Thu Sep 01, 2011 4:28 pm

Partial duplicate file

Post by oreille72 »

Hi Everybody,

First, excuse my english because I'm French (nobody's perfect :) )

Duplicate Cleaner is the most efficient tool I've used. May I suggest you it could be improved by searching partial correspondance between files ?
I upload a lot of videos from various sites. Some of them are full movies, others are extracts. I'll be happy to delete the extracts when I have the full movie on my disks. Is it possible ? For example:
if A__________________________Z is the full movie and L_________Q is a part anywhere in this movie (it could be A___B, S____Z or anything else) I'd like to identifie the segment to delete it.

I hope the developper's team will be intested by this deal. Thanks for all.
Best,
Didier (from France)
User avatar
Fool4UAnyway

Re: Partial duplicate file

Post by Fool4UAnyway »

Searching for exact and complete matches is kind of straightforward. As soon as you find any difference, you can stop. You can apply several strategies to speed up that process to avoid lots of comparisons across multiple files: you could just check parts of files that still may be the same and have found to be the same for the part searched so far.

When you search for partial matches, it is clear that a partial file should be smaller than the original file it is a part of. But the next problem is: the part could start anywhere in the original file. This will lead to a huge number of possible partial matches. Also, the results could be presented in a tree like way, in which files that contain smaller files may themselves be part of other original (longer) files. A search strategy simplification could be to start with small files and then compare them to slightly larger files first/only. But a small file may also be a part of multiple larger files (although this is unlikely in the case of videos, Iguess, but it is not impossible).

I guess the number of possible matches and links between files would be too much too deal with. Perhaps you could tag your videos and group them by obvious similarities and just perform the check yourself. I wonder you would store all these kind of videos if you wouldn't have watched them yourself anyway.
Post Reply