Re-Scan MD5 Duplicates, byte-to-byte

The best solution for finding and removing duplicate files.
Post Reply
Duplicate
Posts: 4
Joined: Wed May 24, 2017 5:41 pm

Re-Scan MD5 Duplicates, byte-to-byte

Post by Duplicate »

Hi

I'm using the latest version of Pro and was wondering if it was possible to scan for duplicates using the fast MD5 method, and then have the option to scan identified duplicates using the byte to byte method?

Thanks
User avatar
therube
Posts: 614
Joined: Tue Jun 28, 2011 4:38 pm

Re: Re-Scan MD5 Duplicates, byte-to-byte

Post by therube »

You can change the scan method; MD5 or other hash methods, or byte-to-byte.
But you cannot do a MD5 compare & then directly send that list to be scanned byte-to-byte.

In any case, IMO, if a file compares MD5 or SHA-1 or SHA-256...
While there are potential clashes...
The chance of, much less the importance of...

Sending a spaceship to the moon, is one thing.
Getting rid of duplicate MP3's is another.

Also, benchmark the various comparison methods.
There might be times where one method works out better then another, even with different source files.

viewtopic.php?p=6128#p6128


Some time back, i ran some "hash speed tests" (not with DC).

With many thousands (80K) of small files < 400KB, all hashes (& byte-to-byte) benchmarked essentially identically.

With a smaller number (~200) of larger files (4-20 MB range), a hash comparison was much faster then byte-to-byte.
(In the hash, SHA-1, case much of the work was done in RAM, rather then the hugely expensive "disk access" that byte-to-byte was requiring.)
Duplicate
Posts: 4
Joined: Wed May 24, 2017 5:41 pm

Re: Re-Scan MD5 Duplicates, byte-to-byte

Post by Duplicate »

Thanks, great advise.
Much appreciated :-)
wwcanoer
Posts: 49
Joined: Wed Aug 19, 2020 5:49 am

Re: Re-Scan MD5 Duplicates, byte-to-byte

Post by wwcanoer »

Old post, but I wish that I could do that too.

One way is to move the duplicate files from source 2 to a new "deleted" folder and then re-run by comparing source 1 to the "deleted" folder with the byte to byte. The result should be that everything in the deleted folder is a duplicate, so you select all duplicates in that folder, delete them, and the folder should be empty. If there's any files left, it would really interesting to see the files with the same size and MD5 that aren't the same.
User avatar
DigitalVolcano
Site Admin
Posts: 1717
Joined: Thu Jun 09, 2011 10:04 am

Re: Re-Scan MD5 Duplicates, byte-to-byte

Post by DigitalVolcano »

This would be super rare to have an MD5 collision in a normal fileset. I do have a couple of jpgs with different pictures within and the same MD5. They were deliberately created using a supercomputer though!
Post Reply