Speed up for very large fragmented and duplicated music collection

The best solution for finding and removing duplicate files.
Post Reply
kbs
Posts: 9
Joined: Thu Jul 30, 2015 9:55 am

Speed up for very large fragmented and duplicated music collection

Post by kbs »

I have a very large music collection which has been partly copied/backed up to several places and modified and I'm trying to get it back into one coherent whole.
Files are arranged in trees according to type (FLAC and MP3) and desirability (when space was short, demos and live were hived off elsewhere). I'f discovered caches here and there, and being an OCD completer want to a) keep the best versions of what I have, b) not lose anything I already have, and c) remove dupes to save space (eliminate compilation copies etc).

I've no longer got working space considerations (very fast large SSD array) and have acquired killer CPU availability (Ryzen 7 6 core 12 thread). Duplicate Cleaner Pro barely registers on CPU usage (only a couple of threads in use out of 12), and comparison progress is slow when running it across the lot.

I'm guessing dividing and conquering would be quicker - pick a definitive directory tree, dedupe it internally (scan against self), set it to master then add another other tree as external if V5 used (V4 scan against self not needed), then merge them into the master (single tree, or as a separate master?) once deduped, and gradually add all other trees. Does that make sense? I'm guessing that if I have more than one master tree, effort is wasted comparing the 2 or more masters with each other...?

However, is there any way of utilising the cpu power available to speed things up? A Similarity scan on the whole lot was done in hours with the CPU and GPU added into the mix and the fans running hard...
Regards, Keith
Callistemon
Posts: 85
Joined: Fri Jun 25, 2021 5:15 am

Re: Speed up for very large fragmented and duplicated music collection

Post by Callistemon »

I'm guessing dividing and conquering would be quicker - pick a definitive directory tree, dedupe it internally (scan against self), set it to master then add another other tree as external if V5 used (V4 scan against self not needed), then merge them into the master (single tree, or as a separate master?) once deduped, and gradually add all other trees.
Scan against self functionality isn't exclusive to version 4. Duplicate Cleaner 5 has scan against self enabled by default when the "Find duplicates" dropdown is set to Yes. In addition, only version 5 includes "Internal only" as an option.

Yes = scan against self and against other folders
External only = only scan against other folders
Internal only = scan against self but not against other folders
Post Reply