Page 1 of 1
failure to detect duplicates
Posted: Wed Sep 14, 2011 7:07 pm
by MaxG
Does duplicate cleaner have any known bugs or issues with xpt files? I have a set of xpt files that I'm fairly certain are duplicates; the scan result has the files as all duplicates using a byte-level scan, but using the MD5 hash scan, none are duplicates. Any thoughts? Thanks!
Re: failure to detect duplicates
Posted: Thu Sep 15, 2011 1:55 am
by therube
File type should not matter, particularly.
MD5 is known to have collisions.
Byte level is going to be the most thorough, & if that says they are dup's, then you would think they are.
If you ... oops. That's backwards.
---
If byte level shows as dup's then you would think that MD5 would too, unless there was a bug in the MD5 algorithm?
What do SHA-1 & SHA-256 show?
Relatively small files (from Mozilla?)? If so, zip them up & upload them somewhere.
(Actually looks like you may be able to upload directly here to the board.)
Re: failure to detect duplicates
Posted: Thu Sep 15, 2011 2:02 pm
by MaxG
Thanks, Rube. I ran SHA-1 & SHA-256 and the results are that same as MD5: no duplicates. This is puzzling since the byte-level is the same, properties exactly same, and content during thorough inspection appears the same. I've only seen this occur with xpt files thus far.
Has anyone else used duplicate cleaner on any xpt files (or other statistical modeling program output files)?
Re: failure to detect duplicates
Posted: Thu Sep 15, 2011 7:40 pm
by DigitalVolcano
The file type shouldn't really make a difference - DC just treats everything as binary data.
Do you have any other options specified (eg Same date, filename, etc)?
How many files are affected? What size are the files? I'd be interested in screenshots from scans on both sets, if that's possible.
thanks!
Re: failure to detect duplicates
Posted: Sun Sep 18, 2011 12:05 pm
by lazyman
MaxG, if with MD5 you do not find dulicates, don't use MD5. Fair simple.