Request: A way to scan duplicates according to specific metadata
Posted: Fri Jan 17, 2025 2:04 pm
Hi,
I'm not entirely sure how feasible this is, but as someone with a lot of files across a few folders who routinely scans to check for duplicates, I'd love a way to be able to sort of filter out a lot of files after the "reading metadata" stage before the "matching duplicates" stage, to cut down on how long the total search time takes. Sometimes I just want to run a duplicate scan against a few thousand files instead of the whole folder.
For example, assume I have a particular string in the Comments field of files, or I just want to scan any files that have *anything* in the Authors field. I'd like to be able to, in the Scan Location tab, set up Keywords: 'SpecificKeyword' or Authors: 'SpecificKeyword' or even just 'Yes', and on running the scan have the software check the folders, ignore any files without those specific keywords or Author name or a blank Author field, and then continue the scan from that point, instead of scanning duplicates against all items in the folders.
I'm not quite sure if I'm making that clear enough, but it would save a lot of time in doing "I'll just check I don't already have this file" type scans if I know what metadata I would add to the file and can quickly just scan from a smaller virtual list of files with said metadata. At the moment I've got an entire second folder I keep only files I expect to run into duplicates more often in in an attempt to make scans faster but if the program had a built-in 'filter out files that don't have these metrics before running the matching duplicates subroutine' that would save me both time and storage space. It just took me half an hour to do a duplicate scan to check a dozen files against my smaller storage folder when being able to just say "scan against the master folder for any files with X in the Authors field" probably would have ended the scan after five minutes, for example.
I did a bit of a bad moc-kup to try to illustrate what I'm thinking, but again, I don't know how viable this would be as a suggestion, but figured I'd try since you added the Keywords (Sorted) option, which was very helpful
I'm not entirely sure how feasible this is, but as someone with a lot of files across a few folders who routinely scans to check for duplicates, I'd love a way to be able to sort of filter out a lot of files after the "reading metadata" stage before the "matching duplicates" stage, to cut down on how long the total search time takes. Sometimes I just want to run a duplicate scan against a few thousand files instead of the whole folder.
For example, assume I have a particular string in the Comments field of files, or I just want to scan any files that have *anything* in the Authors field. I'd like to be able to, in the Scan Location tab, set up Keywords: 'SpecificKeyword' or Authors: 'SpecificKeyword' or even just 'Yes', and on running the scan have the software check the folders, ignore any files without those specific keywords or Author name or a blank Author field, and then continue the scan from that point, instead of scanning duplicates against all items in the folders.
I'm not quite sure if I'm making that clear enough, but it would save a lot of time in doing "I'll just check I don't already have this file" type scans if I know what metadata I would add to the file and can quickly just scan from a smaller virtual list of files with said metadata. At the moment I've got an entire second folder I keep only files I expect to run into duplicates more often in in an attempt to make scans faster but if the program had a built-in 'filter out files that don't have these metrics before running the matching duplicates subroutine' that would save me both time and storage space. It just took me half an hour to do a duplicate scan to check a dozen files against my smaller storage folder when being able to just say "scan against the master folder for any files with X in the Authors field" probably would have ended the scan after five minutes, for example.
I did a bit of a bad moc-kup to try to illustrate what I'm thinking, but again, I don't know how viable this would be as a suggestion, but figured I'd try since you added the Keywords (Sorted) option, which was very helpful