Speed up for very large fragmented and duplicated music collection
Posted: Tue Dec 07, 2021 5:23 pm
I have a very large music collection which has been partly copied/backed up to several places and modified and I'm trying to get it back into one coherent whole.
Files are arranged in trees according to type (FLAC and MP3) and desirability (when space was short, demos and live were hived off elsewhere). I'f discovered caches here and there, and being an OCD completer want to a) keep the best versions of what I have, b) not lose anything I already have, and c) remove dupes to save space (eliminate compilation copies etc).
I've no longer got working space considerations (very fast large SSD array) and have acquired killer CPU availability (Ryzen 7 6 core 12 thread). Duplicate Cleaner Pro barely registers on CPU usage (only a couple of threads in use out of 12), and comparison progress is slow when running it across the lot.
I'm guessing dividing and conquering would be quicker - pick a definitive directory tree, dedupe it internally (scan against self), set it to master then add another other tree as external if V5 used (V4 scan against self not needed), then merge them into the master (single tree, or as a separate master?) once deduped, and gradually add all other trees. Does that make sense? I'm guessing that if I have more than one master tree, effort is wasted comparing the 2 or more masters with each other...?
However, is there any way of utilising the cpu power available to speed things up? A Similarity scan on the whole lot was done in hours with the CPU and GPU added into the mix and the fans running hard...
Regards, Keith
Files are arranged in trees according to type (FLAC and MP3) and desirability (when space was short, demos and live were hived off elsewhere). I'f discovered caches here and there, and being an OCD completer want to a) keep the best versions of what I have, b) not lose anything I already have, and c) remove dupes to save space (eliminate compilation copies etc).
I've no longer got working space considerations (very fast large SSD array) and have acquired killer CPU availability (Ryzen 7 6 core 12 thread). Duplicate Cleaner Pro barely registers on CPU usage (only a couple of threads in use out of 12), and comparison progress is slow when running it across the lot.
I'm guessing dividing and conquering would be quicker - pick a definitive directory tree, dedupe it internally (scan against self), set it to master then add another other tree as external if V5 used (V4 scan against self not needed), then merge them into the master (single tree, or as a separate master?) once deduped, and gradually add all other trees. Does that make sense? I'm guessing that if I have more than one master tree, effort is wasted comparing the 2 or more masters with each other...?
However, is there any way of utilising the cpu power available to speed things up? A Similarity scan on the whole lot was done in hours with the CPU and GPU added into the mix and the fans running hard...
Regards, Keith