New duplicate criteria

The best solution for finding and removing duplicate files.
Post Reply
luca1010
Posts: 2
Joined: Sun Dec 17, 2017 12:44 pm

New duplicate criteria

Post by luca1010 »

Hi,

Sometimes I need to find similar but not identical files, and no current system available on Duplicate cleaner can help.

Therefore I suggest to add 2 new criteria.

1) similar size: let the user decide a tolerance for sizes, let's say that user decides to tolarate 5% difference then a file 1.000 bytes long can match with files from 950 to 1.050 bytes. Of course this is mutually exclusive with same size

2) similar name: this is trickier. You could use a similarity algorithm (fuzzy search) , like this https://en.wikipedia.org/wiki/Approxima ... g_matching and let the user decide how similar must be names to be considered identical. This is mutually exclusive with same name.

The actual use case is having tons of files created with different zip level (so slightly different size) and with different naming conventions.

let's say that I have
01-kittens.zip 1.000 bytes
1-Kittens.zip 998 bytes

They appear different, but in my context I should consider them equal, so I would use a similar size tolerance of 3% and a suitable similarity index (it depends on the actual algorithm you would implement for fuzzy search) in order to find this "duplicate".

Of course is up to te user to apply those criteria with responsability and combine them with other in order to actually find duplicates, but I think hey would really help to make this great program even better.

Regards

Luca
User avatar
DigitalVolcano
Site Admin
Posts: 1717
Joined: Thu Jun 09, 2011 10:04 am

Re: New duplicate criteria

Post by DigitalVolcano »

I think the functions you are looking for are both already in Duplicate Cleaner Pro-
https://www.duplicatecleaner.com/manual ... =&sct=MA==
Post Reply