Comparison clarification

The best solution for finding and removing duplicate files.
Post Reply
tomtom
Posts: 1
Joined: Fri May 19, 2017 9:19 pm

Comparison clarification

Post by tomtom »

Hi,
I was just wondering, if someone knows how Duplicate Cleaner Pro uses the hashes?
I mean, it seems that it calculates e.g. MD5 for all files and only compare on these. Is this correct?
And, in that case, why is no additional checks made, to verify that it isn't an accidental clash? It could do an additional salted MD5 hash and compare, or do SHA1 or even byte-to-byte, and perhaps compare the size as well.
Perhaps, it was an idea, instead of allowing users to change the method of comparison (byte-byte, MD5, SHA1, and so on) that the user could chose to add an additional verification, only if the files are initially (e.g. MD5) found to be identical.
The reason I ask is that I am currently processing an excessively large number of files and are concerned that there could be a single clash, which would be rather unfortunate.
/tom
User avatar
therube
Posts: 614
Joined: Tue Jun 28, 2011 4:38 pm

Re: Comparison clarification

Post by therube »

Duplicate Cleaner is not limited to MD5 hash.
• Byte-to-byte (Compares all identically sized files against the other on a byte-by-byte basis)
• MD5 (Hash algorithm -fastest)
• SHA-1 (Hash algorithm - slower)
• SHA-256 (Hash algorithm -slowest)
Options | More Options -> Advanced settings.

https://www.duplicatecleaner.com/manual ... =&sct=NTA0
User avatar
DigitalVolcano
Site Admin
Posts: 1717
Joined: Thu Jun 09, 2011 10:04 am

Re: Comparison clarification

Post by DigitalVolcano »

I think the chances of a hash collision using MD5 is so vanishingly small as not to be worth worrying about (unlike CRC-32). Also, identical file size is taken into account first so this greatly reduces the amount of MD5 comparisons made.

As therube mentions, you can change the hash type in the options menu if you are worried.
Post Reply