DigitalVolcano wrote: Sat Sep 21, 2019 5:03 pm
3- Will probably split System folders and hidden files into two options (with hidden on by default).
'Hidden on' by default, in my eyes is not such a good idea:
When you e.g. have folders with pictures in it, they often also contain a hidden database file. When you let DCP compare those folders and the user deletes a possible duplicate folder, an orphaned database file will remain also in the not fully deleted duplicate folder.
The majority of users will probably never change those default settings.
DigitalVolcano wrote: Sat Sep 21, 2019 5:03 pm
Note - a grey file means it isn't in the DC database.
A few questions
As long as you are not giving away any company secrets, maybe you could answer me a few questions. On one hand I am very interested in how DCP works, and on the other hand, this information would help me in finding bugs a bit easier.
But first a question about a user case of mine:
Let's say, I have folders with (duplicate) pictures. In folder A there are only 3 pictures, in folder B there are 20 pictures. Those 3 pictures in folder A all have duplicates in folder B.
Of course, I want to delete the duplicates in folder A, with only 3 pictures.
If I know all of the above, then it's easy to decide that folder B is the one to keep, it's easy using 'Mark by location' in the Selection Assistant.
But, is there a way to let DCP assist me in finding that decision?
Of course, I could use 'Show Folder in Windows Explorer'. But this is not practical, as I would have to do this for every duplicate.
Anyway to achieve my goal with any settings/config?
And now, here are my questions on how DCP works
Maybe it will save you some time, if you just comment my observations (right or wrong).
In the following, I assume that I am just searching for identical files with identical MD5 hash.
Step 1:
DCP creates a list of all files and folders, size, creation and modification time included.
Step 2:
It checks, which files have identical sizes.
Step 3:
For all of those files quick hashes are being calculated, to exclude all files that already turn out to be different after this quick hash calculation. In parallel already in this step a full MD5 hash is being calculated for a number of files (
which are smaller than…?)
May I ask, how the quick hash is being calculated? Is it similar to ed2k hash calculation where larger files are being split up in chunks and a hash is being calculated for each chunk separately?
I would do it this way by defining maybe just 3 smaller chunks per file (close to the start of the file, in the middle of the file, and close to the end of the file.
Step 4:
For all remaining files that 'passed' the quick hash comparison, a full MD5 hash is being calculated.
Step 5:
According to the result the duplicate file list and the duplicate folder list is being populated.
Some additional questions to that:
As I read (and found out myself), the database file is encrypted, which is too bad. Because I cannot edit it

that way.
1. But can you tell me, how DCP saves hashes it once had computed?
Is it the exact path information? And if I change just one single letter in the path, DCP won't 'know' that file anymore?
2. There is probably a size limit for the database file? What is this limit? And what happens, once it has been reached, (first in first out)?
And I learned that there will be different database files in version 5

QUOTE: "
to split the databases (caches, settings and scans) up and generate a new one for each scan."
ok, hope that this was not too many questions…