Page 1 of 1

Image Sets found as Dupes?

Posted: Fri Nov 02, 2012 4:55 pm
by Graffxman
Greetings everyone;

I manage thousands of graphics/images and although relatively new to DC, I have found it very useful in locating and cleaning up literally thousands of dupes.
However.. I'm still not very confident in this products capabilities and hesitant about deleting 100's of files listed as Dupes by DC when I discovered in several instances the files located were in fact, not dupes at all.
The whole idea of using this product, at least to me, is to save the time it takes to visually compare and find dupes using a standard image managing app.
With DC, and the lack of confidence after finding so many dupe errors, Im not saving that much time if I have to use the image preview feature for every group found, specially when there are hundreds - thats very time consuming.
While experimenting, in several instances I discovered that DC would list an entire group as Dupes, but on using image preview I found that they were actually a series of images belonging to a set - which led me to wonder how awesome it would be to find a way of using DC to locate image SETS, not just Dupes! -Specially when dealing with hundreds of thousands of images, spread across several folders and many of which are part of a set.
Meaning- Lets say there is an entire set of 100 "1968 Pontiac GTO" images, maybe taken at the same time, but at different resolutions, aspect ratios, or have been modified at some point to where they wont show up using the "created date" or "modified date" search option.

I discovered that DC listed several sets as Dupes while this using the following criteria:
Find: Image
Find Images: 72% similar
Find: Flipped (and) Rotated

After some experimenting I also discovered that using the options "Same Resolution" and "Same Aspect Ratio" ended up excluding Dupes that had modified resolutions and ratios, so I dont use that option for the most part.

The question is, unless I am specifically wanting to locate possible image sets, how do I keep DC from listing sets as Dupes so that I can delete all found Dupes with confidence rather than have to spend so much time verifying each group to make sure that DC didnt make an error..?

Re: Image Sets found as Dupes?

Posted: Mon Nov 05, 2012 2:39 pm
by DigitalVolcano
Glad you like DC. What do you mean by Image sets?

72% is a fairly low bar for image dupes - did you try upping it a bit (say 90%)?

I'd be interested in having a copy of any images that DC is incorrectly grouping, for testing purposes!

thanks

Re: Image Sets found as Dupes?

Posted: Tue Nov 06, 2012 1:19 am
by Graffxman
Thanks for the reply!

Yes, I agree that 72% is a relatively low level to search for dupes, and no, haven't experimented with higher settings.. yet.
I was so intrigued with the results of my initial settings I havent ventured further, and really havent had time either..
As said, the results made me wonder how I could use this great app to locate image 'sets' as well, in addition to dupes.. now THAT would be awesome!

Sure, I can give you copies of the folders where the sets were found..
How do I get them to you..?
Of course, confidentiality is expected..

Rick~

Re: Image Sets found as Dupes?

Posted: Tue Nov 06, 2012 2:10 pm
by DigitalVolcano
Of course. If they aren't too big you can email them to software AT digitalvolcano.co.uk.
thanks

Re: Image Sets found as Dupes?

Posted: Tue Nov 06, 2012 5:15 pm
by Graffxman
As stated, I found many instances where DC listed files as Dupes, that were not - based on the settings mentioned.
I ran the scan again and chose 2 of the dozens of groups that are supposed to be dupes, copied all into separate "DUPE_CHK" folders and made 2 zip files that Im going to email to you..
Note:
The scan was done on a folder that contained numerous sub-folders and the "Dupes" that were listed in the group were not files all from the same sub-folder..
I have also included screen-caps of the settings and results so you can see how it was derived.

Im curious to find out why DC listed these as "Dupes" when they are not..
Thanks ~

Re: Image Sets found as Dupes?

Posted: Tue Nov 06, 2012 7:42 pm
by DigitalVolcano
I think 72% is just too low to give useful results. 85% on these test images gives a good result, and going up to 95% separates out the similar ones even more. Both of these settings still (correctly) pick up the slightly resized dupes.
Give it a try on 90% and see what you think.

Re: Image Sets found as Dupes?

Posted: Thu Nov 08, 2012 12:58 am
by Graffxman
Thanks for all the assistance!
Agreed, my settings were just way too low..
I re-scanned starting at 95% and did subsequent test scans at lower settings until I stayed at a cutoff of 88%.. very pleased with the results.

I then tested on a folder with numerous sub-folders containing appx. 8800 files total.
I started the scan at 95% and DC found nearly 4000 dupes!
I did a random QC check on about 32 of the detected groups..no errors.
I moved them all to a new folder, re-scanned at 88% and moved those to another folder..
By the time I was done I was down to 64% and still getting actual dupes, but at that setting about 1/2 were not.. nevertheless, I still found it highly useful because the previous scans allowed me to remove the bulk (1000's of files!!) and end up with less than 100 groups that were not a problem to manually check every group..
Yes, Im extremely pleased and very glad I chose to buy DC Pro.. with some more experimentation on the settings I'll have this mastered in no time!

Notes / suggestion?
Under Selection Assistant > Mark > File Size
Either smallest or largest in each GROUP - very helpful feature..
However, under > Mark > Image SIZE (width/height) > smallest or largest .. it doesnt provide the option of "In each GROUP" and only selects the smallest/largest of them all..
For me at least, based on what I need to do with certain image files, it would be very helpful to have that option - select all the smallest or largest files - from each Group.. whereas the object would be to end up with only 1, either the largest or smallest..
Thanks.