Any way to prevent repeat of "Discovering Files"?

The best solution for finding and removing duplicate files.
Post Reply
wwcanoer
Posts: 51
Joined: Wed Aug 19, 2020 5:49 am

Any way to prevent repeat of "Discovering Files"?

Post by wwcanoer »

How can I efficiently use Duplicate Cleaner?

I am searching 25 TB on 14 drives. The first scan took 24 hours to find 10 million duplicate files and 46,000 duplicate folders. A re-scan of the directories (Discovering Files alone, before matching duplicates starts) takes 1.5 hours.

After the first scan, I reviewed the results and decided to first dedupe four folders on one drive. So, I added those four folders and then chose "exclude" for each of the other drives on the Scan Location tab. (I did not delete them.) I deduped those directories until no duplicates were remaining.

Then I chose "included" for all of the drives (and deleted the four folders). Scanning then read the directories of all of the drives again. This "Discovering Files" takes 1.5 hours. I can't stop for 1.5 hours after every piece of work.

When this scan is done, then I will look again at how to use DC efficiently. Once I identify folders to work with, then I would like to filter the results to only those particular folders because I can handle investigating a few the target folders (in WinCatalog or explorer) to see what they contain and decide which folder(s) to keep and which ones to delete from. But I don't want to be cluttered thinking about the entire list at once.

For example, when I had 4 folders to work with, they were 500, 200, 50 and 20 GB, so I can look at those four folders, decide what to delete from, typically starting with folders, and repeat until there's zero duplicates left.

In 1.5 hours, I will look at the filtering options but I don't think that I can filter based on drive or folder, but maybe I can.

I expect that my feature requests are:
(1) Allow the results to be filtered to only show a subset of files.
(2) Add an option to allow the user to prevent rescan. The user knows if they made any changes or not. Ideally, enable selection of the drives to re-scan. Ex. A primary backup drive may never change, so no need to rescan it. Now (in ver 5), could use a virtual folder for that drive.
wwcanoer
Posts: 51
Joined: Wed Aug 19, 2020 5:49 am

Re: Any way to prevent repeat of "Discovering Files"?

Post by wwcanoer »

Took 2.5 hours to complete the scan. Each drive had at least 2 files for MD5 and a data drive that I didn't touch had 231 new MD5 file calculations. Maybe since my active Data and Windows drives are adding files, some are new (unique) sizes that now match files on that drive. (?) As the user, I know that there will be no new matching files but DC doesn't know.

The duplicate file list can be filtered by document type category (doc, photo, video, ...) or customizable files size buckets but can't do any custom filters (ex. by location).

Looks like I need to go back to my old method of identifying folders to dedupe, adding those to Scan Locations, and then running the scan. Hopefully DC is now remembering the MD5s so that those don't have to recalculate. For example, I scan and dedupe my main photo storage folders, and then scan on all drives to clean up copies that may be scattered around. And just suffer the rescan time.
wwcanoer
Posts: 51
Joined: Wed Aug 19, 2020 5:49 am

Re: Any way to prevent repeat of "Discovering Files"?

Post by wwcanoer »

An Example: From the Duplicate Folders list, there's four copies of this 24GB folder but I can see that they all have a parent folder named 2019-Mi, which is on 4 drives. So, I want to filter on all folders named "2019-Mi" and dedup them all at once.
  • L:\_Seagate4TB\_Phones\_2019-Mi\Mi-32GB-USB2\
  • X:\Sync-T530\F\Mi\_2019-Mi\Mi-32GB-USB2\
  • Y:\P4\_Phones-ID-CanDelete\P4\_Phones-ID\_2019-Mi\Mi-32GB-USB2\
  • Z:\P\_Phones-ID\_2019-Mi\Mi-32GB-USB2\

Using WinCatalog, I see that I have 13 folders on 5 drives called "2019-Mi".

So, I normally export that folder list from WinCatalog into Duplicate Cleaner "Scan Locations" and dedupe them all at once.

I can open a new clean scan in Duplicate Cleaner but then when I reload the original large scan, I need to wait 2.5 hours for it to re-scan. If I could filter the locations within the large scan, then DC would keep track of and know what was deleted so that I don't need to re-scan.

Looking for a way to do this efficiently.

Working with the large scan, it is slow to mark folders and move to another tab, so maybe it is more efficient for me to go through the Duplicate flies/folders lists to identify many sets of target folders, dedupe each set in clean session, and then update the large scan. Even without the large scan loaded, I think that DC is remembering the MD5 data, so hopefully it doesn't have to recalculate those with each new clean session.
Post Reply