Image duplicates

The best solution for finding and removing duplicate files.
kira13
Posts: 6
Joined: Thu Jul 10, 2014 7:18 pm

Image duplicates

Post by kira13 »

I used a different program to import images from my usual one, and it imported all the images from my camera's SD card instead of only importing the new ones. I tried running a duplicate image search, but something about the newly-imported images is different enough that even a similarity scan isn't correctly finding the duplicates; in fact, if I lowered the similarity enough to find duplicates at all, it found non-duplicate pictures as duplicates but still didn't find the (visibly, anyway) exact duplicates!

The dupes all have the same filenames as the originals with a (0) appended. Is there any way to search for the duplicates by filename? Or are there settings I have set wrong that would let me find them some other way? (For example, same EXIF data?)

These are easy to find and remove in Windows Explorer or in my photo organizer program, but I really don't want to remove 1800 duplicate photos by having to select each one.

Kira
User avatar
DigitalVolcano
Site Admin
Posts: 1731
Joined: Thu Jun 09, 2011 10:04 am

Re: Image duplicates

Post by DigitalVolcano »

The first thing to check is that the new images are being included in the scan - is the total file count correct? Could the new program have imported them into a different folder? Or are they a different format such as RAW?
User avatar
therube
Posts: 615
Joined: Tue Jun 28, 2011 4:38 pm

Re: Image duplicates

Post by therube »

Expecting that they are (exact) duplicates, I would have done a Regular Mode scan, with Same Content, first.

Though based on what you are saying, I'd expect that they are not exact duplicates.

So I'd examine the two of the files, pict.jpg & pict(0).jpg, with some sort of comparison program that might point out "obvious" differences - like a tag at the head or end of the file that your importing program may have added. I'd expect something like that to be consistent between image pairs.

Having some idea what happened, what is different between files, I'd then go back to a Regular Mode scan but this time with Similar Content, perhaps starting with 90% & see if that doesn't pick "dups" out correctly.

Are the file pairs the same size?
kira13
Posts: 6
Joined: Thu Jul 10, 2014 7:18 pm

Re: Image duplicates

Post by kira13 »

The images are in the same folder. I import into a folder named after my camera model; that's why they all got " (0)" added to the filenames, the import program was adding them to the same place as the previous ones but didn't recognize that they were already there. They're all .JPGs; this camera doesn't do RAW files.

I did do a Regular Mode scan with Same Content first. I then did the Regular Mode scan with Similar Content, I think at 90% or 95%, and that's the one that found "duplicates" that weren't the same but not one pair of actual duplicates. Then I remembered that Image Mode existed and tried that.

What can I use to find out the differences between my files? The metadata I can see in Windows and my importing programs don't show me anything that looks relevant. (I did find that the Date Taken was one second different between one of the pairs, but that shouldn't make as much of a size difference as there is between them.)

The first program that produced the duplicates was one I had never before used to import photos, so it seems perfectly reasonable that either it or my previous one or both added something to the photos on import. After doing some reading, I determined that the program may well be incapable of importing only new images as opposed to all the images on the card, so I went back to my original program (which came with the camera)--only to find that it was being discontinued in favor of a newer one. So I downloaded the newer one and used it, and it clearly has the option to import only new photos--but apparently it has a database or something to remember them, because it also downloaded every photo over again, leaving me with another set of duplicates that had " (1)" added to the filenames.

Duplicate Cleaner Pro found all the duplicates (in Image Mode with 99% similarity, which was the first scan I tried) with the extra numbers, but still didn't find the original photos that were already on the drive before either import. So I successfully deleted the newer set of duplicates.

I use FreeFileSync to sync my photos to my server and then to my laptop; could that be writing something to them? *Edit: Also, File History backs them up; could it be an archive flag of some sort?

These files should have the same EXIF data; I don't suppose there's any way to include that as a criterion?
User avatar
therube
Posts: 615
Joined: Tue Jun 28, 2011 4:38 pm

Re: Image duplicates

Post by therube »

> I did find that the Date Taken was one second different between one of the pairs

Is this stored in the file itself, or the (Windows) directory entry listing?
If the latter, then it is immaterial.

The changelog for the lastest FreeFileSync 6.8 shows:

Code: Select all

New comparison option to ignore file time shift in hours
Tentatively disabled DST hack affecting FAT file creation times
> FreeFileSync ... could that be writing something to them

No.
FFS does not write anything "into" any files.

> What can I use to find out the differences between my files?

I typical use the File Comparator built into my file manager, Altap Salamander.
Otherwise... offhand, not sure what to suggest...?
User avatar
therube
Posts: 615
Joined: Tue Jun 28, 2011 4:38 pm

Re: Image duplicates

Post by therube »

Perhaps post 1 set of these "duplicate" jpg's.
User avatar
DigitalVolcano
Site Admin
Posts: 1731
Joined: Thu Jun 09, 2011 10:04 am

Re: Image duplicates

Post by DigitalVolcano »

You could try unchecking the 'Don't scan system files/folders' option. Sometimes add system flag gets added to a folder/file which means it gets skipped in a scan.
kira13
Posts: 6
Joined: Thu Jul 10, 2014 7:18 pm

Re: Image duplicates

Post by kira13 »

Okay. Tried the last suggestion from DigitalVolcano (unchecking "Don't scan system files/folders"). Didn't work.

I downloaded a few comparison tools. DiffImg wouldn't run on my Windows 8.1 system; VisiPics wouldn't find any duplicates. WinMerge found 3 differences between the two photos I loaded in it; 2 of the differences didn't show up in the differences section, but the third was a difference in text. One of the two photos had some kind of Adobe path listed (I think a web address, rather than a folder path), possibly appended onto the EXIF data. But WinMerge doesn't seem to be geared toward binary files, so I'm not sure how useful that was.

If I find what the difference is, how would I incorporate that into a Duplicate Cleaner Pro scan anyway? I've already run a scan previously with a lower percentage of difference, and when I lowered it enough to get duplicates (I forget the exact percentage), I wound up with non-duplicates that were considered duplicates, but none of the actual duplicates I was looking for. I'm running another one again now in case I only did that with an image scan instead of a regular one.

I can't upload the files, they're too big. I have a 16 megapixel camera that makes ~5 MB files. I can crop them or change the resolution or re-compress them, but that might also change whatever is different between them too. Let me see if I can use Dropbox or OneDrive--

Okay, I uploaded 2 pairs of duplicates to OneDrive. The first pair (DSC00024) is the one I checked with WinMerge, and the extra one was caused by importing with Serif PhotoStack; the 2nd pair (DSC01828) is a more recent one, and was caused by importing with Sony PlayMemories Home. The two originals were imported by Sony Picture Motion Browser. (Duplicate Cleaner Pro found and deleted all the ones that were duplicated by PlayMemories Home that were also duplicated by PhotoStack.) Here's the link:

https://onedrive.live.com/redir?resid=5 ... lder%2cJPG

Meanwhile, my scan at 90% similarity just finished and it found 2 pairs of duplicates; MAH00637.MP4.modd and MAH00638.MP4.modd is one, MAH00637 (1).MP4.modd and MAH00638 (1).MP4.modd is the other. So it did happen to find 2 real duplicates, it just found them more similar to different movies than to each other.

Oh, and the Date Taken must be part of the Windows directory entry listing; it wasn't different when looking at the metadata in PhotoStack.

What would really help right now is to find all the pairs with filenames that differ by having " (0)" or " (1)" appended to them. I already know these are duplicates, I just need to delete them all at once instead of one at a time. (I don't have any qualms about losing whatever the importing programs added to them.)

Thanks for your help so far!
User avatar
therube
Posts: 615
Joined: Tue Jun 28, 2011 4:38 pm

Re: Image duplicates

Post by therube »

(Ah, we're back up.)

Yep, that "Adobe" thing.
A "text" compare actually works better then a binary compare (in this case).
There is a single byte difference early on, then that Adobe thing.

Code: Select all

 ß	~http://ns.adobe.com/xap/1.0/

<?xpacket begin="´++" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 4.4.0-Exiv2"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:exif="http://ns.adobe.com/exif/1.0/" exif:DateTimeOriginal="2011-12-03T17:23:12Z"/> </rdf:RDF> </x:xmpmeta>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 <?xpacket end="w"?>
(There is additional padding ($20$) that does not display in the above code. Actually is does once I close the [ code ] tag correctly.)

Not sure how to tell you to work around that?
Looks like this, Extensible Metadata Platform (XMP).
Maybe there is a utility around to remove that data from the files, then a similar comparison 99% is much more likely to work.
Last edited by therube on Sat Aug 09, 2014 8:26 pm, edited 2 times in total.
User avatar
therube
Posts: 615
Joined: Tue Jun 28, 2011 4:38 pm

Re: Image duplicates

Post by therube »

cd "C:\Program Files (x86)\ExifTool" RemoveJunkEXIF.bat *.jpg

Suppose you could try something like that & see how the files compare after that.
Work on a backup set of two pictures & see how it goes.
Still unlikely to be 100% duplicates but if it cuts cleanly a similar search might easily pick out your wanted dups.
And you could always move the found dups instead of deleting them outright, as that would give you an opportunity to (perhaps rename first the (0) files) & then compare the source & dup directories for any oddities.

https://www.google.com/search?q=how+to+ ... eamonkey-a


I'll suppose Picture Mode does not have an option to take meta data into consideration, but if it did, it would be a benefit to the program.
Post Reply