Page 1 of 1
Whittling down duplicates from very large collection.
Posted: Sun May 25, 2025 10:51 pm
by bsacco
I have a large collection of photos and videos.
Too large to do a one-time scan for duplicates.
I have a MASTER folder where I want to keep all my deduped photos and videos.
What is the best practice/method for comparing another folder full of potential duplicates against my MASTER folder?
Is the answer setting my MASTER (de-Duped) folder as the "Master" in SCAN LOCATION>FIND DUPLICATES (Dropdown menu), AND then setting the folder with the potential duplicates in SCAN LOCATION>FIND DUPLICATES (Dropdown menu) to "EXTERNAL ONLY"?
Can someone please provide me step-by step-instructions because I find the documentation very confusing and not user-friendly.
Thanks, bob
Re: Whittling down duplicates from very large collection.
Posted: Mon May 26, 2025 4:01 pm
by bsacco
I have tried asking AI how to do this using DCP5 but all I get is general directions not step-by-step.
Duplicate Cleaner Pro 5 provides several effective strategies for managing large collections of duplicates:
Initial Setup and Scanning
Start by configuring your scan criteria carefully. Use the file type filters to focus on specific categories (photos, documents, music) rather than scanning everything at once. Set appropriate file size thresholds to exclude very small files that are likely system files or thumbnails.
Smart Selection Methods
The software offers multiple selection modes in the results view. Use "Select Oldest" or "Select Newest" to automatically mark files based on creation dates. The "Select Shortest Path" option helps keep files in more organized folder structures while removing those buried deep in subdirectories.
Preview and Verification
Before deleting anything, use the built-in preview feature to verify duplicates. This is especially important for images and documents where file names might be misleading. The software shows file details like dimensions, creation dates, and folder paths to help you make informed decisions.
Batch Processing Approach
Rather than trying to process everything at once, work in batches. Start with the most obvious duplicates (identical file names and sizes) before moving to more complex matches. This reduces the risk of accidentally deleting important files.
Safe Deletion Options
Use the "Move to Recycle Bin" option rather than permanent deletion initially. For even more safety, consider using the "Move to Folder" feature to relocate suspected duplicates to a review folder before final deletion.
Advanced Filtering
Leverage the advanced filtering options to exclude certain folders (like system directories) or include only specific file extensions. You can also set up ignore lists for files you know you want to keep multiple copies of.
The key is working systematically and verifying your selections before committing to deletions, especially when dealing with large collections where manual review of every duplicate isn't practical.
---------------------------------------------------------------
Any chance I can get step-by-step instructions on how to specifically use DCP5 to achieve my goal of de-duping a large collection?
Re: Whittling down duplicates from very large collection.
Posted: Tue May 27, 2025 10:04 am
by DigitalVolcano
Don't bother with AI - it will get confused and will give wrong/outdated information.
The process:
MAKE SURE YOU HAVE A SEPARATE BACKUP FIRST
- Set "Master"folder to 'External only + Master' and 'Protected'
-Set "potential duplicates" folder to 'External only'
-Run a Regular mode->Same content scan
You'll now have a list of files that appear on both the master and the potential folder. The master folder ones are protected. (check this)
If you want to delete the duplicates from the 'potential' folder-
-Use the selection assistant
-Mark 'All but one in each group' . This will mark all duplicates on the 'potential' folder, not the master folder.
-DOUBLE CHECK YOU'VE PROTECTED THE MASTER FOLDER. There should be nothing marked here if it is protected.
-Use the File Removal-> Delete function to delete the dupes. Send them to recycler if there isn't too many.
You'll now be left with non-duplicate files in the 'potential' folder.
Re: Whittling down duplicates from very large collection.
Posted: Tue May 27, 2025 5:10 pm
by bsacco
Thank You! Thank You! Thank You! Thank You! Thank You! Thank You! Thank You! Thank You! Thank You!
The most powerful info I received on this forum to date!
Re: Whittling down duplicates from very large collection.
Posted: Mon Jun 23, 2025 7:42 pm
by bsacco
OK, I have been busy whittling down my duplicates.
What I found is that many of the duplicates don't have the correct dates in the file name, i.e. "2004_10_30_at the beach.jpg" looks like "at the beach.jpg (no date) or is has the incorrect date "1960_01_05_at the beach.jpg".
My first thought was to export the processed duplicates by GROUP (post scan), so that I could rename all the duplicate files with the correct dates, making it easier to find the second-best duplicate of my master saved picture.
But, I could not find a post scan EXPORT by GROUP function.
Does anyone know another workaround to renaming all found duplicates in BULK by GROUP (post scan?)
The ability to rename files in Bulk is key here because of the volume. I realize you can do it manually, but I'm looking at 225k files. So, that's a no-go.
Best, Bob
Re: Whittling down duplicates from very large collection.
Posted: Tue Jun 24, 2025 4:07 pm
by DigitalVolcano
You can bulk rename marked files :-
https://www.digitalvolcano.co.uk/duplic ... =&sct=NDY4
they can also be tagged with group number.