I am testing out the trial version of the Duplicate Cleaner 3.0.4 product and set up some test files to see how the program works. I have a 2010 Word Document saved as a .docx and copied the file in the same folder. I opened up the file and switched the location of a single character 'a' with a single character 'e' to make the files marginally different (the file sizes are less than a KB different). I then saved it as a 97-2003 document to have the .doc ending.
When I run a search to find files with similar content, I can never get the program to identify those two documents as being similar. I've tried 90% similar and 60% similar, and neither one finds it. To test out if the docx vs doc mattered, I copied the .docx file again and renamed it as a .xslx file, but didn't change the content and that file is found as a duplicate at both 90 and 60%, so I don't think it's because of the file extension.
Is there any known bugs about the similar content percentage function, or do you have a way to fix this?
Thanks.
Similar Content Search Criteria
- DigitalVolcano
- Site Admin
- Posts: 1731
- Joined: Thu Jun 09, 2011 10:04 am
Re: Similar Content Search Criteria
Docx and doc are totally different formats internally - docx is basically a zip file, so I'm not surprised there is no similarity. Not sure of a way round this when checking on a binary level.
-
- Posts: 2
- Joined: Fri Mar 23, 2012 6:50 pm
Re: Similar Content Search Criteria
I see, so trying to find similarities in .docx files is difficult because of the compressed nature of the file.
I created two .docx files that are very similar and the 60% filter could not find them as duplicates. I went to each file and resaved them as .doc files and the 60% filter found them without changing the content.
I created two .docx files that are very similar and the 60% filter could not find them as duplicates. I went to each file and resaved them as .doc files and the 60% filter found them without changing the content.