Duplicate Folders Problem/Bug

The best solution for finding and removing duplicate files.
StefanM
Posts: 10
Joined: Wed Sep 18, 2019 8:41 am

Duplicate Folders Problem/Bug

Post by StefanM »

I'm currently testing DuplicateCleaner Pro.
It shows alleged duplicate folders which are not duplicates.

The scenario
Search Criteria: (only the following check boxes have been checked)
  • Same content
  • Any size
  • Any date
  • Don't follow NTFS mountpoints and junctions
  • Filter: included *.*
Folders to search:
For both, Folder1 and Folder2:
  • Status Included
  • Protected No
  • Master No
  • Scan against self Yes
  • Find uniques No
  • Scan subfolders Yes
After scanning for duplicates, in the Duplicate Folders tab, however, I also find folders that have only one duplicate file in common. The rest of the files are no duplicates.
Example:
Folder 1 contains 3 files
  • File1
  • File2
  • File3
Folder 2 contains 1 file
  • FileA
FileA and File1 are identical.

Anyway, these two folders are shown in the Duplicate Folder tab, even though they are NOT DUPLICATES.
In the window on the right, however, FIle2 and File3 (which are no dupes) are grayed out.
User avatar
DigitalVolcano
Site Admin
Posts: 1717
Joined: Thu Jun 09, 2011 10:04 am

Re: Duplicate Folders Problem/Bug

Post by DigitalVolcano »

It sounds like some of the files are being excluded from the search for some reason. What file types are greyed out?

It may help if you post your Duplicate Cleaner log file (or send it to support if you don't want it publicly viewed).
StefanM
Posts: 10
Joined: Wed Sep 18, 2019 8:41 am

Re: Duplicate Folders Problem/Bug

Post by StefanM »

DigitalVolcano wrote: Wed Sep 18, 2019 11:18 am It sounds like some of the files are being excluded from the search for some reason. What file types are greyed out?

It may help if you post your Duplicate Cleaner log file (or send it to support if you don't want it publicly viewed).
I can assure you that no files have been excluded and there is no such info in the log file.
According to the log there are no excluded files.

And there are quite a number of alleged (!) dupe folder groups.

Group 325
Grayed out file types are *.mp4 (some are grayed out, others not)

Group 330
Grayed out file types are *.jpg (one folder has 120 jpg-files, the other folder has only one jpg file)
In this example this single jpg-file has a duplicate in the folder with 120 jpg files.
The other 119 jpg-files are grayed out.
The size of those two alleged dupe folders is displayed as the size of that single dupe jpg-file.

Fact is that I performed quite a number of move-operations, and every time the number of dupe folders has changed.
Quite often it increased (!) after a file-move-operation.

I'm currently just testing. But you can tell me what else to test, even on a deeper level, as I am a recently retired IT specialist, I have the time and the knowledge needed.
StefanM
Posts: 10
Joined: Wed Sep 18, 2019 8:41 am

Re: Duplicate Folders Problem/Bug

Post by StefanM »

DigitalVolcano wrote: Wed Sep 18, 2019 11:18 am It sounds like some of the files are being excluded from the search for some reason. What file types are greyed out?

It may help if you post your Duplicate Cleaner log file (or send it to support if you don't want it publicly viewed).
I conducted a few more tests with some interesting findings, which I now can confirm 99.9 % as bugs.
But documenting this would take quite some time. So, first please let me know if you are interested in those results, because otherwise I don't want to spend the time.

Note: I tested with some 3.5 TB of data with a result of some 120,000 duplicate files and some 350 duplicate folders.
It was a new scan and at the moment I am verifying the dupe folders, if this time all of them are really dupes.
It looks as if false duplicate group results only occur after a file removal operation (of course files moved to a new folder, which is not part of the scan location list).
I am using Spaceman99 by ExtraBit Software for crosschecking.

A very obvious bug, however, is this one:
When sorting duplicate folders by number the result is as follows: (alphabetical order of group numbers instead of ascending order of group numbers)
1
10
100
101
103 (102 is missing - same as a few other numbers - even with this new scan - Why that???)
104
105
106
107
108
109
11
101
...
User avatar
DigitalVolcano
Site Admin
Posts: 1717
Joined: Thu Jun 09, 2011 10:04 am

Re: Duplicate Folders Problem/Bug

Post by DigitalVolcano »

Thanks for checking this. Yes, I'd be interested in any data/report you have. Currently working on a new update and all feedback/bug reports are very useful.

Did the folders in question contain files that had been removed (or copied in) after the scan? Bear in mind that the program works only with the files found at the time of the scan, and it doesn't 'know about' the greyed out files when calculating the duplicate folders (it just shows them for information)
User avatar
DigitalVolcano
Site Admin
Posts: 1717
Joined: Thu Jun 09, 2011 10:04 am

Re: Duplicate Folders Problem/Bug

Post by DigitalVolcano »

Re: the duplicate folder numbering - it's just doing a text sort on the group title as there is no group field shown. Hence the odd sorting. Also the missing group numbers are sub-groups that are completely contained by larger folder groups and so not shown. All this is fixed and simplified in version 5.0 :)
StefanM
Posts: 10
Joined: Wed Sep 18, 2019 8:41 am

Re: Duplicate Folders Problem/Bug

Post by StefanM »

DigitalVolcano wrote: Thu Sep 19, 2019 10:59 am Did the folders in question contain files that had been removed (or copied in) after the scan?
No, they did not!
DigitalVolcano wrote: Thu Sep 19, 2019 10:59 am Bear in mind that the program works only with the files found at the time of the scan
I am aware of that :)
DigitalVolcano wrote: Thu Sep 19, 2019 10:59 am ..., and it doesn't 'know about' the greyed out files when calculating the duplicate folders (it just shows them for information)
Ok, but could you please let me know all possible causes, that could cause a file getting grayed out?

And one more question:
When I want to reproduce a scenario, is it sufficient to make copies of the file DuplicateCleaner4_Pro.data to re-use them later for testing particular scenarios again? (of course provided that all folders included in that particular scan still have exactly the same content)
StefanM
Posts: 10
Joined: Wed Sep 18, 2019 8:41 am

Re: Duplicate Folders Problem/Bug

Post by StefanM »

More findings:

Setting checked before scan "Automatically hide single file groups"
Dupe check result after scan:
46852 dupe files
320 dupe folders

Uncheck setting "Automatically hide single file groups"
Refresh list
Dupe check result:
47418 dupe files (increased is ok)
315 dupe folders (decreased, why???)

Check setting again"Automatically hide single file groups"
Refresh list
Dupe check result:
46852 dupe files (decreased is ok)
315 dupe folders

Uncheck setting again "Automatically hide single file groups"
Refresh list
Dupe check result:
46852 dupe files (not increased again is not ok)
315 dupe folders

In the last two 'check and uncheck tests' there wasn't any change anymore, which is not correct.

Following just some suggestions for version 5

1. 'Mark by location window'
The text is a bit confusing. I suggest to use a text which is easier to understand
Instead of 'Mark all files that duplicate this folder tree elsewhere'
I suggest 'Mark all files outside folder tree which have a copy inside'
Instead of 'Mark the files in this folder tree that have duplicates elsewhere'
I suggest 'Mark all files inside folder tree which have a copy outside'

2. 'Scan summary'
It reads e.g.: '48625 files have duplicates (557 GB)'
This is misleading, as the user might think they could save 557 GB by deleting all duplicates.
I suggest adding an info on space used by duplicate files: 'Total size of excess duplicate files'

3. Option 'Protect important system folders'
This option does not only exclude system folders, it also excludes any hidden files which are not necessarily system files.
If you have e.g. duplicate folders with hidden files, they are excluded.
And this is the default setting!
I would suggest to restrict the protection to system folders only.
StefanM
Posts: 10
Joined: Wed Sep 18, 2019 8:41 am

Re: Duplicate Folders Problem/Bug

Post by StefanM »

German Translation

As a native German speaker, I just checked the German language file.
The translation is very good, just very few issues. All of them are minor issues, no real problems.
However, at the end there are a number of lines still in English.

If you want, I can translate them for you and I could also correct those minor issues.
Just let me know.

I've already been a translator for Jetico, for Extreme Internet Software (even translated quite a number of web pages, e.g. https://de.free-photo-screensaver.com/) ...

Don't know, if we should continue communicating via the forum.
I guess you have my email address... ;)
User avatar
DigitalVolcano
Site Admin
Posts: 1717
Joined: Thu Jun 09, 2011 10:04 am

Re: Duplicate Folders Problem/Bug

Post by DigitalVolcano »

Thanks for the info and suggestions - I'll go through this tomorrow and will respond!

Copying the DB is file - It contains all settings and data.
Post Reply