Page 1 of 1

Comparison clarification

Posted: Fri May 19, 2017 9:28 pm
by tomtom
Hi,
I was just wondering, if someone knows how Duplicate Cleaner Pro uses the hashes?
I mean, it seems that it calculates e.g. MD5 for all files and only compare on these. Is this correct?
And, in that case, why is no additional checks made, to verify that it isn't an accidental clash? It could do an additional salted MD5 hash and compare, or do SHA1 or even byte-to-byte, and perhaps compare the size as well.
Perhaps, it was an idea, instead of allowing users to change the method of comparison (byte-byte, MD5, SHA1, and so on) that the user could chose to add an additional verification, only if the files are initially (e.g. MD5) found to be identical.
The reason I ask is that I am currently processing an excessively large number of files and are concerned that there could be a single clash, which would be rather unfortunate.
/tom

Re: Comparison clarification

Posted: Sat May 20, 2017 3:57 am
by therube
Duplicate Cleaner is not limited to MD5 hash.
• Byte-to-byte (Compares all identically sized files against the other on a byte-by-byte basis)
• MD5 (Hash algorithm -fastest)
• SHA-1 (Hash algorithm - slower)
• SHA-256 (Hash algorithm -slowest)
Options | More Options -> Advanced settings.

https://www.duplicatecleaner.com/manual ... =&sct=NTA0

Re: Comparison clarification

Posted: Sat May 20, 2017 1:56 pm
by DigitalVolcano
I think the chances of a hash collision using MD5 is so vanishingly small as not to be worth worrying about (unlike CRC-32). Also, identical file size is taken into account first so this greatly reduces the amount of MD5 comparisons made.

As therube mentions, you can change the hash type in the options menu if you are worried.