I have been using DC since version 3, but I never made a lot of headway with THIRTY years' worth of duplicated material in text, image, video, and audio formats. I have spent most of 2020 making a single pass through about 8TB, and I figured out how to crash DV4. Don't start out with 1.6 million duplicate files on a single drive.
Now I need to regroup with V5, which should give me long-needed and -awaited options. I thought I'd inquire here about strategy.
First, I've been a local and family historian who manages a hard-copy repository for certain communities and surnames. I am in the process of digitizing everything for on-line access.
My ultimate goal is to end up with multiple external drives that contain various topics: Photos, Video, Audio, Books I've Written, Massive Collection of Local and Family History Files (both my own research and that of others). I have thousands of PDF files containing the contents of about 50 linear feet (approx. 15 meters) of documents.
That is my base. However, there are multiple iterations and copies of most of the material.
After spending 2020 manually deduplicating (because I'm a data control freak who doesn't want to lose important files), I need to designate a single external drive as the "hub" for storage. Then I need to compare every byte on the media I started with in 2020 and ensure that ONE copy of the most-recently version of every file exists on the hub.
Then, I will feel comfortable storing those starter CD's, DVD's, floppy disks, and hard drives permanently and creating my topical drives (redundantly, of course) for vault storage.
The "master" concept was present in Version 4, but it wasn't terribly intuitive for me -- using multiple drives and media.
From what I've observed about the Comparison Wizard on the Version 5 splash screen, I should be able to accomplish my task of identifying MISSING files between two sources.
The problem is, I will have hundreds of thousands of missing files, and I don't want to go through the long process I had to go through in 2020.
So...
I'm looking for suggestions others might have successfully implemented in similar circumstances -- or may be planning -- before I spend months of 2021 pretty much going back through 1.6 million files.
As I create thousands more from a massive scanning project I've got slated for 2021.
If you made it this far, thanks for taking the time to read -- and thanks in advance for your recommendations!
Billie, snowed-in in the Smoky Mountains, USA
