Page 1 of 1

Not detecting duplicates.

Posted: Wed Mar 13, 2013 9:33 am
by jacko_bss
I purchased Duplicate Cleaner Pro some time ago but have not been able to accurately detect all duplicates with it. Today I installed the latest version and tried a simple test that I have tried many times before. I created a folder and inserted the same file four times with the original filename of each file changed. However, content remained identical. All files had a ".doc" extension. My search criteria was limited to "same content". I selected any file filter, any file size and any file date. I checked for duplicates in Regular Mode and all four files were detected to be duplicates.

I saved two of the ".doc" files in ".docx" format and checked again. Only the two files in .doc format were detected as duplicates. This suggests to me that Duplicate Cleaner Pro is unable to detect "docx" files that are duplicates which might explain why many of my duplicates are not detected.

I tried a quick check of another folder containing random files. (as far as the extension is concerned.) However, after finding 5405 duplicates, a "file too long" error displayed so Duplicate Cleaner Pro stopped checking and did not reveal the duplicates it had found. An attempt at a different folder was initiated which was successful. Duplicates included ".docx" files.

Can anyone help me to eliminate the two problems mentioned above?

Re: Not detecting duplicates.

Posted: Wed Mar 13, 2013 10:26 am
by DigitalVolcano
Did you copy the docx in explorer files, or save them out twice from word?
If you saved them out twice, I assume the files will not be the same, as they will have different metadata inside relating to save times, number of saves, etc.

Re: the 'file too long' error - which version of Duplicate Cleaner was this? Long file name issues are dealt with in update 1.1.2

thanks!

Re: Not detecting duplicates.

Posted: Wed Mar 13, 2013 12:55 pm
by jacko_bss
Thank you for your reply.

I tried again, this time saving the same document directly from Word using a different filename each time, with two saved as ".doc" and two saved as ".docx". None of the files scanned was detected to be a duplicate.

The version of Duplicate Cleaner Pro I am using is 3.1.2. Is there an option to deal with long file names because it is frustrating to get part way through a scan only to have that error displayed?

Re: Not detecting duplicates.

Posted: Wed Mar 13, 2013 1:15 pm
by DigitalVolcano
jacko_bss wrote: I tried again, this time saving the same document directly from Word using a different filename each time, with two saved as ".doc" and two saved as ".docx". None of the files scanned was detected to be a duplicate.
Because you saved them all from word one after the other, they have different version/time info embedded inside the doc, so they aren't exact duplicates.
jacko_bss wrote: The version of Duplicate Cleaner Pro I am using is 3.1.2. Is there an option to deal with long file names because it is frustrating to get part way through a scan only to have that error displayed?
Is it possible to post a screenshot of the error? It should deal with the long names automatically, and should not pop up with errors during the scan.
thanks

Re: Not detecting duplicates.

Posted: Thu Mar 14, 2013 7:09 am
by jacko_bss
"Because you saved them all from word one after the other, they have different version/time info embedded inside the doc, so they aren't exact duplicates."

I tried again, using these formats:

.pdf, .rtf, .txt, .doc, .docx

Only the .txt files were recognised as duplicates.

What I am finding it difficult to come to grips with is why the text files were recognised as duplicates and not the others and why other identical files saved at different times are detected as duplicates whilst these files saved at different times are not. What I want to be able to do is delete any files that have exactly the same text/graphics content despite having different filenames, date saved or file type.


"Is it possible to post a screenshot of the error? It should deal with the long names automatically, and should not pop up with errors during the scan."

I have attached a screenshot.

Thanks for your help.

Re: Not detecting duplicates.

Posted: Thu Mar 14, 2013 10:09 am
by DigitalVolcano
jacko_bss wrote: What I am finding it difficult to come to grips with is why the text files were recognised as duplicates and not the others and why other identical files saved at different times are detected as duplicates whilst these files saved at different times are not. What I want to be able to do is delete any files that have exactly the same text/graphics content despite having different filenames, date saved or file type.
The thing is that these office formats can be different inside on a binary level every time you save them. The text (txt) files aren't - as this is a plain format without any embedded metadata.
If there is one byte different they won't be listed as duplicates - at least in content mode. If you copy them, rename them, etc in Windows Explorer you'll find that these copies will be found as duplicates, as they are identical.
One way round this issue is to try the % content difference - say 90% - this may allow for the change in timestamp/versioning metadata inside the file.
jacko_bss wrote: I have attached a screenshot.
Thanks. I've worked out that it's the Zip files within a very long path name context that are causing problems. We'll get to work on a fix! Note that Windows operating system support for very long path/filenames in patchy at best...
As a workaround, turn off the 'Scan in Zip files' option for these problem scans.

thanks.

Re: Not detecting duplicates.

Posted: Fri Mar 15, 2013 4:39 am
by jacko_bss
Thanks for your helpful reply.

I will continue to persevere with this software because although not exactly what I envisioned it still catches many of my duplicates.