Identical MD5 but different files...

The best solution for finding and removing duplicate files.
User avatar
FB

Identical MD5 but different files...

Post by FB »

Hi,
Searching for duplicates, two files (.jpg) were found with same MD5 but they do have different names and moreover the pictures are indeed not the same. Hopefuly I('ve checked before deleting. Any ideas how it can happen since from what I read it's almost impossible ?
Thanks
User avatar
FB

Post by FB »

My mistake - it seems I've made an error in comparing the files -
User avatar
Stu

Post by Stu »

I have 2 files in the same folder that are different sizes and pictures. They have similar names (red tailed hawk.jpg and red tailed hawk2.jpg) Scan showed both files with the same with the larger file size and identical MD5.

I am experienced and tried it several times, examined the files, etc.

How can that be?
User avatar
Stu

Post by Stu »

Re: previous post

- Please excuse my sloppy typing

- I am using version 1.4.6, downloaded yesterday - WinXP Pro - 4GB RAM

Thank you
User avatar
DV

Post by DV »

DC won't even check for an MD5 match if the files are different sizes. They definately shouldn't have the same MD5 at any rate! Is it possible for you to do a screenshot of the duplicate file list?
User avatar
anionic

Post by anionic »

May I add my tuppenceworth here? If two files' MD5s differ, then the files are definitely different, but if the MD5s are identical, the files MAY OR MAY NOT be identical. To be sure of forming duplicate-groups accurately, when a scanned file is found to have the same MD5 as a previously scanned file, their contents should be compared byte-for-byte, and a new group started if necessary.

If DC already does this, I am seriously impressed and will donate �5 :-) but if not, there is a risk (higher than theoretical, as MD5 isn't particularly collision resistant) that a user will be misled into deleting a unique file :-/

The risk is reduced when manually selecting files for deletion if filenames give some reassurance that duplicates are genuine, but I wouldn't e.g. noninteractively hardlink all groups on my hard drive in case it screwed up something...
Post Reply