failure to detect duplicates

The best solution for finding and removing duplicate files.
User avatar
MaxG

failure to detect duplicates

Post by MaxG »

Does duplicate cleaner have any known bugs or issues with xpt files? I have a set of xpt files that I'm fairly certain are duplicates; the scan result has the files as all duplicates using a byte-level scan, but using the MD5 hash scan, none are duplicates. Any thoughts? Thanks!
User avatar
therube
Posts: 624
Joined: Tue Jun 28, 2011 4:38 pm

Re: failure to detect duplicates

Post by therube »

File type should not matter, particularly.

MD5 is known to have collisions.

Byte level is going to be the most thorough, & if that says they are dup's, then you would think they are.

If you ... oops. That's backwards.

---

If byte level shows as dup's then you would think that MD5 would too, unless there was a bug in the MD5 algorithm?

What do SHA-1 & SHA-256 show?

Relatively small files (from Mozilla?)? If so, zip them up & upload them somewhere.

(Actually looks like you may be able to upload directly here to the board.)
User avatar
MaxG

Re: failure to detect duplicates

Post by MaxG »

Thanks, Rube. I ran SHA-1 & SHA-256 and the results are that same as MD5: no duplicates. This is puzzling since the byte-level is the same, properties exactly same, and content during thorough inspection appears the same. I've only seen this occur with xpt files thus far.

Has anyone else used duplicate cleaner on any xpt files (or other statistical modeling program output files)?
User avatar
DigitalVolcano
Site Admin
Posts: 1804
Joined: Thu Jun 09, 2011 10:04 am

Re: failure to detect duplicates

Post by DigitalVolcano »

The file type shouldn't really make a difference - DC just treats everything as binary data.

Do you have any other options specified (eg Same date, filename, etc)?

How many files are affected? What size are the files? I'd be interested in screenshots from scans on both sets, if that's possible.
thanks!
User avatar
lazyman

Re: failure to detect duplicates

Post by lazyman »

MaxG, if with MD5 you do not find dulicates, don't use MD5. Fair simple.
Post Reply