Inconsistent filter behavior: same file name and extension

The best solution for finding and removing duplicate files.
Frozenburger
Posts: 4
Joined: Fri Aug 30, 2024 5:28 pm

Inconsistent filter behavior: same file name and extension

Post by Frozenburger »

Greetings!

I am facing a peculiar issue.

CONTEXT

I am using "regular mode" (tested byte-to-byte, SHA-256 and Blake2b-512) to compare 2 folders (and their subfolders): folder "X" has the same files as "Y", plus some others.

Both are set as "External only + Master". However, setting both to "external only" yielded the same results. Only "Y" is set to "show remaining".

No "search filters" were applied (as such, I included "all file extensions", "any file size" and "any date").

In each operation I tested a different combination of "more duplicate options".

RESULTS

Here are the options applied and the respective results:

1. Nothing checked under "more duplicate options"
1.1. Duplicate: 66.944
1.2. Remaining: 3

2. "Same file name"
2.1. Duplicate: 66.918
2.2. Remaining: 16

3. "Same file extension"
3.1. Duplicate: 65.160
3.2. Remaining: 895

4. "Same file name" and "same file extension"
4.1. Duplicate: 66.918
4.2. Remaining: 16 files

ISSUE 1

Unless I am missing something obvious, there seems to be some inconsistencies.

No. 4 (same name and ext.) is the most strict of all the operations and, as such, should find an amount of duplicates equal or lesser than all the others, and the amount of remaining files should be equal or greater.

However, it found more duplicates and less remaining than no. 3 (same ext.).

On the other hand, no. 4 (same name and ext.) found the same amounts as no. 2 (same name). The remaining files are identical.

ISSUE 2

Another weird behavior is: no. 3 (same ext.) results in 895 remaining. Only 3 of those (2x "desktop.ini" and a shortcut) are found in no. 2 and 4. All the rest are files with unspecified formats (file type = "file).

However, I randomly checked some of those files and they actually are duplicates: both folder "X" and "Y" have those files, with the same hashes and same extension (no extension, for that matter).

As such, they should be shows as "duplicate files" instead of "remaining files".

CONCLUSION

Am I missing something? I checked the documentation, but found no explanation for that kind of results.

Any help is appreciated.
User avatar
DigitalVolcano
Site Admin
Posts: 1863
Joined: Thu Jun 09, 2011 10:04 am

Re: Inconsistent filter behavior: same file name and extension

Post by DigitalVolcano »

Thanks, I will check this out. Which version are you using?
Frozenburger
Posts: 4
Joined: Fri Aug 30, 2024 5:28 pm

Re: Inconsistent filter behavior: same file name and extension

Post by Frozenburger »

Greetings!

Using 5.23.0 - 64bit.
Frozenburger
Posts: 4
Joined: Fri Aug 30, 2024 5:28 pm

Re: Inconsistent filter behavior: same file name and extension

Post by Frozenburger »

Another group of results (with more items):

1. Nothing checked
Groups of duplicates 32,168
Files that have duplicates 67,172
Folder groups 63
Remaining files 6
Duplicate files (copies) 35,004
Duplicate files (originals) 32,168

2. Same name
Groups of duplicates 32,879
Files that have duplicates 67,145
Folder groups 62
Remaining files 33
Duplicate files (copies) 34,266
Duplicate files (originals) 32,879

3. Same extension
Groups of duplicates 31,304
Files that have duplicates 65,387
Folder groups 142
Remaining files 1,791
Duplicate files (copies) 34,083
Duplicate files (originals) 31,304

4. Same name + extension
Groups of duplicates 32,881
Files that have duplicates 67,145
Folder groups 62
Remaining files 33
Duplicate files (copies) 34,264
Duplicate files (originals) 32,881
User avatar
DigitalVolcano
Site Admin
Posts: 1863
Joined: Thu Jun 09, 2011 10:04 am

Re: Inconsistent filter behavior: same file name and extension

Post by DigitalVolcano »

Thanks for the data.

The reason you are getting these results is that 'Same file extension' used on its own won't match files with no file extension. It will match files with no extension when used in combination with 'Same filename' though.

Whether or not this is a feature or a bug is up for debate!
Generally DC doesn't like to match things where the data is missing (e.g no extension, missing metadata, etc).

You can test this with just four files:-

Code: Select all

Folder a\file.txt
Folder a\noext
Folder b\file.txt
Folder b\noext
Going forward, it's possible another setting is needed e.g. Match File extension (including blanks)
Frozenburger
Posts: 4
Joined: Fri Aug 30, 2024 5:28 pm

Re: Inconsistent filter behavior: same file name and extension

Post by Frozenburger »

All right. Thanks for the explanation :D
Joshua26
Posts: 1
Joined: Tue Sep 10, 2024 5:08 pm

Re: Inconsistent filter behavior: same file name and extension

Post by Joshua26 »

DigitalVolcano wrote: Wed Sep 04, 2024 3:12 pm Thanks for the data.

The reason you are getting these results is that 'Same file extension' used on its own won't match files with no file extension. It will match files with no extension when used in combination with 'Same filename' though.

Whether or not this is a feature or a bug is up for debate!
Generally DC doesn't like to match things where the data is missing (e.g no extension, missing metadata, etc).

You can test this with just four files:-

Code: Select all

Folder a\file.txt
Folder a\noext
Folder b\file.txt
Folder b\noext
Going forward, it's possible another setting is needed e.g. Match File extension (including blanks)
The 'Same file extension' criterion will not match files that do not have an extension. Combine it with 'Same filename' to get precise results.nulls brawl
Post Reply