Any way to prevent repeat of "Discovering Files"?
Posted: Mon Nov 27, 2023 4:03 pm
How can I efficiently use Duplicate Cleaner?
I am searching 25 TB on 14 drives. The first scan took 24 hours to find 10 million duplicate files and 46,000 duplicate folders. A re-scan of the directories (Discovering Files alone, before matching duplicates starts) takes 1.5 hours.
After the first scan, I reviewed the results and decided to first dedupe four folders on one drive. So, I added those four folders and then chose "exclude" for each of the other drives on the Scan Location tab. (I did not delete them.) I deduped those directories until no duplicates were remaining.
Then I chose "included" for all of the drives (and deleted the four folders). Scanning then read the directories of all of the drives again. This "Discovering Files" takes 1.5 hours. I can't stop for 1.5 hours after every piece of work.
When this scan is done, then I will look again at how to use DC efficiently. Once I identify folders to work with, then I would like to filter the results to only those particular folders because I can handle investigating a few the target folders (in WinCatalog or explorer) to see what they contain and decide which folder(s) to keep and which ones to delete from. But I don't want to be cluttered thinking about the entire list at once.
For example, when I had 4 folders to work with, they were 500, 200, 50 and 20 GB, so I can look at those four folders, decide what to delete from, typically starting with folders, and repeat until there's zero duplicates left.
In 1.5 hours, I will look at the filtering options but I don't think that I can filter based on drive or folder, but maybe I can.
I expect that my feature requests are:
(1) Allow the results to be filtered to only show a subset of files.
(2) Add an option to allow the user to prevent rescan. The user knows if they made any changes or not. Ideally, enable selection of the drives to re-scan. Ex. A primary backup drive may never change, so no need to rescan it. Now (in ver 5), could use a virtual folder for that drive.
I am searching 25 TB on 14 drives. The first scan took 24 hours to find 10 million duplicate files and 46,000 duplicate folders. A re-scan of the directories (Discovering Files alone, before matching duplicates starts) takes 1.5 hours.
After the first scan, I reviewed the results and decided to first dedupe four folders on one drive. So, I added those four folders and then chose "exclude" for each of the other drives on the Scan Location tab. (I did not delete them.) I deduped those directories until no duplicates were remaining.
Then I chose "included" for all of the drives (and deleted the four folders). Scanning then read the directories of all of the drives again. This "Discovering Files" takes 1.5 hours. I can't stop for 1.5 hours after every piece of work.
When this scan is done, then I will look again at how to use DC efficiently. Once I identify folders to work with, then I would like to filter the results to only those particular folders because I can handle investigating a few the target folders (in WinCatalog or explorer) to see what they contain and decide which folder(s) to keep and which ones to delete from. But I don't want to be cluttered thinking about the entire list at once.
For example, when I had 4 folders to work with, they were 500, 200, 50 and 20 GB, so I can look at those four folders, decide what to delete from, typically starting with folders, and repeat until there's zero duplicates left.
In 1.5 hours, I will look at the filtering options but I don't think that I can filter based on drive or folder, but maybe I can.
I expect that my feature requests are:
(1) Allow the results to be filtered to only show a subset of files.
(2) Add an option to allow the user to prevent rescan. The user knows if they made any changes or not. Ideally, enable selection of the drives to re-scan. Ex. A primary backup drive may never change, so no need to rescan it. Now (in ver 5), could use a virtual folder for that drive.