The answer to your question requires a little explanation. This will also help you determine which mode to use for specific tasks in the future.
The difference between image mode and regular mode is that regular mode simply compares the files themselves. It will only find duplicates according to things like filename and date OR alternatively, if you specify, by actual byte content.
Image Mode actually looks at the picture and can help you find duplicates of the same image, even if the files themselves are different (e.g. same picture but: different size/resolution, different orientation (upside down, mirror image, etc.), different color scheme, etc.).
So the question you have to answer is, what are you concerned about:
-Exact-exact files (i.e. files that are no different at all. Exact same, byte for byte)
-Files that simply have the same name, timestamp, and size
-Images that are the same picture, but may have different characteristics (e.g. one is black & white, one is color....or one is a thumbnail size, and another is full resolution size)
As for MD5 and byte-to-byte:
MD5 is a hash algorithm. Basically what happens is the file is analyzed byte by byte, and is run through the algorithm to spit out an alphanumerical value. This value is supposed to function like a fingerprint. It is intended to be unique to every specific file. The idea is, when you run a file through the algorithm, if even the slightest piece of the file is different from another file (that is, the entire file is the exact same, except for one single byte), the resulting value will be totally different for each file.
Of course as you might imagine, because hash algorithms are designed to spit out a value with a specific bit length no matter how big the file itself is, (MD5 is 128-bit), eventually you run the risk of a "collision"...that is, finding two different data sources that will spit out the same value. Obviously the odds of this are very low, but it has happened, and when it does, the hash is said to be "broken" and is no longer considered safe to use for cryptographic purposes and the like. (This is why hash functions with
longer values are considered more secure than shorter ones.)
Of course, for the purpose of deduplication, the risk is even less of a problem, for one, because you're not trying to secure data, just determine if you have duplicates...but also, the odds that you'll find a collision in your specific file search set are basically zero.
However, unless you really need a hash value for some reason, you might as well just run a byte-to-byte comparison, as not only will it eliminate the risk of a collision, it will be faster too.
Byte-to-byte:-
- iteratively reads all bytes from File A
- iteratively reads all bytes from File B
- compare read bytes from A and B
MD5 hashing algorithm:-
- iteratively reads all bytes from File A
- Computes File A Hash
- iteratively reads all bytes from File B
- Computes File B Hash
- Compares File A and File B hash
Not only does the hash computation consume more CPU power, hash comparison is not a 100% guarantee that the files are equal since hash collision is a possibility.
That being said, there's a few things that can be done to ensure better or faster byte-by-byte data checking:-
-Have it stop checking between two files on first discovery of inequality
-Read more bytes per block
-If the compared file sets are on different physical disks, multithread reads
(It's possible these have already been implemented in Duplicate Cleaner, and the optimum read bytes per block is already in place, but you get the idea.)
So just to reiterate:
-Use byte-to-byte to find exact exact duplicates of files.
-Use image mode (and the various parameters) to find duplicates of pictures (even if the pictures are different sizes or different color schemes, etc.)
You can even set just how similar the images should be (e.g. find duplicate images that are 85% similar). You might play around with those settings in the Image mode and see how their results differ.