Bit for bit identicals files not seen as duplicates.
Posted: Wed Aug 01, 2012 11:15 pm
I've got a couple of LARGE trees of .jpg files, one 38G and the other 14G. The contain numerous binary duplicate files in a generally matching folder structure.
Another program, Beyond Compare*, finds 10G bytes -- 7500+ files -- with duplicate content and duplicate file/path names (from the top of each tree downward). The ONLY difference is that timestamps in the two trees differ by 1 hour (due to way Windows handles daylight saving time).
(Many, many of these duplicates were created by my spouse copying files instead of moving them as she sorted pictures. They are true duplicate photos, courtesy of Windows Explorer drag and drop between different disk volumes.)
So I fired up DC, added the root of each tree to Scan Locations, deselected "scan against self". On the Search Criteria tab, I selected the "Image Mode" sub-tab and set 100% similar, un-checking all boxes.
In particular "Same Created Date" and "Same Modified Date" are unchecked, suggesting that the file timestamps will NOT be considered in the comparison.
After "Scan Now", the summary shows "32227/32227 Files Scanned (40.5 GB)" with "0 Groups of duplicated" and "0 Files have duplicates (0 Bytes)"
So what am I doing wrong here, please?
If I change the % to 99%, I get the zillion duplicates I expect, but ...
At 99%, groups now include pictures which were saved at reduced jpg quality levels to make the file sizes smaller. I was hoping that reduced quality/smaller sizes would not be considered duplicates until I drop the match percentage a few more points.
-----
* Beyond Compare by Scooter Software is aimed at source code control, does not recognize the content of audio or pictures as such, and requires the folder tree structures to match. But for bit-to-bit compares, it's unimpeachable.
Another program, Beyond Compare*, finds 10G bytes -- 7500+ files -- with duplicate content and duplicate file/path names (from the top of each tree downward). The ONLY difference is that timestamps in the two trees differ by 1 hour (due to way Windows handles daylight saving time).
(Many, many of these duplicates were created by my spouse copying files instead of moving them as she sorted pictures. They are true duplicate photos, courtesy of Windows Explorer drag and drop between different disk volumes.)
So I fired up DC, added the root of each tree to Scan Locations, deselected "scan against self". On the Search Criteria tab, I selected the "Image Mode" sub-tab and set 100% similar, un-checking all boxes.
In particular "Same Created Date" and "Same Modified Date" are unchecked, suggesting that the file timestamps will NOT be considered in the comparison.
After "Scan Now", the summary shows "32227/32227 Files Scanned (40.5 GB)" with "0 Groups of duplicated" and "0 Files have duplicates (0 Bytes)"
So what am I doing wrong here, please?
If I change the % to 99%, I get the zillion duplicates I expect, but ...
At 99%, groups now include pictures which were saved at reduced jpg quality levels to make the file sizes smaller. I was hoping that reduced quality/smaller sizes would not be considered duplicates until I drop the match percentage a few more points.
-----
* Beyond Compare by Scooter Software is aimed at source code control, does not recognize the content of audio or pictures as such, and requires the folder tree structures to match. But for bit-to-bit compares, it's unimpeachable.