Known Duplicates showing up as unique

The best solution for finding and removing duplicate files.
Post Reply
DickyJean
Posts: 4
Joined: Sun Sep 13, 2020 10:11 pm

Known Duplicates showing up as unique

Post by DickyJean » Mon Sep 14, 2020 12:12 am

I'm getting files listed as unique when I'm certain there are duplicates. I have already gone through the troubleshooting at https://digvolsoft.freshdesk.com/suppor ... 6000029796.

I had typed this long post with screenshots, then when I went to submit it I learned I could only use three URLs. I put the whole story with screenshots on a rudimentary web page. Please don't laugh at it, I'm not a developer and I don't even dabble in code or HTML or whatever.

The web page with all of the details is here: https://sites.google.com/view/digitalvolcanohelp

Short version:
I ran the tool using search criteria: same content, all file types; scan location Yes to both scan against and Yes uniques on the smaller data set that I want to rid of uniques. I got a huge number of uniques, and no duplicates. However, I know that there are duplicates.

I ran it again with scan against off, getting the same results.

I ran a subset of the data to confirm that I know there are duplicates. There definitely are: Image

With the smaller data set, I switched to audio mode. I got more duplicates listed, but I still had false uniques.

I'm desperate and don't know what else to do. Please help me!
User avatar
DigitalVolcano
Site Admin
Posts: 1290
Joined: Thu Jun 09, 2011 10:04 am

Re: Known Duplicates showing up as unique

Post by DigitalVolcano » Mon Sep 14, 2020 9:16 pm

It's possibly because it's using comparison mode.

"If a drive/folder is set to 'Don't scan against self' and 'find uniques' then any files that aren't duplicated on other drives are listed in the unique tab. This behavior is very useful for drive comparison."

See large thread here-
viewtopic.php?f=4&t=1882
User avatar
therube
Posts: 488
Joined: Tue Jun 28, 2011 4:38 pm

Re: Known Duplicates showing up as unique

Post by therube » Tue Sep 15, 2020 4:42 pm

Known Duplicates showing up as unique
Taking just two of your known duplicates:
(02 Rock n' Roll Singer.m4a & 06 Overdose.m4a should work)
- are the file sizes the same?
- do the files generate the same hash (MD5, SHA1 or whatever)?

If they aren't the same size, they are not "same content" duplicates.
If they are the same size but do not generate the same hash, they are not "same content" duplicates.

HashMyFiles - Calculate MD5/SHA1/CRC32 hashes of your files
DickyJean
Posts: 4
Joined: Sun Sep 13, 2020 10:11 pm

Re: Known Duplicates showing up as unique

Post by DickyJean » Wed Sep 16, 2020 6:51 pm

I checked and the files are the same size. I downloaded the hash tool, and it looks like they are calculating different hashes.

Image

Why would that be? One was copy and pasted to be the other. Is there a configuration of duplicate pro that would pick these up as duplicates?

grr, I'm doing something wrong and the image won't show https://drive.google.com/file/d/1qitLNm ... sp=sharing
User avatar
therube
Posts: 488
Joined: Tue Jun 28, 2011 4:38 pm

Re: Known Duplicates showing up as unique

Post by therube » Wed Sep 16, 2020 9:34 pm

As the hashes differ, they are certainly different.

Maybe the tags are different?
If you played the song, some media players (may, can, will) update tags?
(No idea what Win10 itself may or may not do.)

Maybe corruption, like a bad spot on a drive?
(Probably not, but.)


Might throw the two files at, WinMerge.
Might point something out.
If it's only a "single line" difference, that's probably a tag.
If there are extensive differences, might bear more investigating.


(Maybe host the two files on Google, [as you did your screenshot].
[I wouldn't be able to get to them till sometime tomorrow, maybe.])
DickyJean
Posts: 4
Joined: Sun Sep 13, 2020 10:11 pm

Re: Known Duplicates showing up as unique

Post by DickyJean » Thu Sep 17, 2020 11:09 pm

therube wrote:
Wed Sep 16, 2020 9:34 pm
Might throw the two files at, WinMerge.

(Maybe host the two files on Google, [as you did your screenshot].
[I wouldn't be able to get to them till sometime tomorrow, maybe.])
Thank you for being so patient with me. I tried WinMerge, but it was all gibberish to me. I'm way out of my depth here.

I was able to upload the files:
Server Version: https://drive.google.com/file/d/1eY1jus ... sp=sharing
Local Version: https://drive.google.com/file/d/1y0T7Ma ... sp=sharing

That part I can figure out :)
User avatar
therube
Posts: 488
Joined: Tue Jun 28, 2011 4:38 pm

Re: Known Duplicates showing up as unique

Post by therube » Fri Sep 18, 2020 6:56 pm

Without hearing (no speakers on the system I'm on), but by comparing, & viewing the files, & by "playing" them, I'd say server is corrupt.

It has a LARGE section of nulls, starting about 1/3 of the way through the file.
If I "play" it, its' "timeline" gets to ~1:15 mark then fast-forwards to the end (i.e., it is only playing up to the 1:15 mark).

The file on the server is corrupt.
The $00$ are "null" bytes, & continue from where first displayed, all the way to the end of the file.

Image


That begs the question, what happened?
Could very well have a bad drive, bad connection... ?
But almost assuredly one copy is good, & one is bad.


Given the above, this is expected:

Code: Select all

ffmpeg  -i "%~1"       -v error -f null -   2>&1

Code: Select all

[aac @ 035df3e0] decode_band_types: Input buffer exhausted before END element found
Error while decoding stream #0:0: Invalid data found when processing input
[aac @ 035df3e0] channel element 0.0 is not allocated
Error while decoding stream #0:0: Invalid data found when processing input
[aac @ 035df3e0] channel element 0.0 is not allocated
Error while decoding stream #0:0: Invalid data found when processing input
[aac @ 035df3e0] channel element 0.0 is not allocated
...

I'd say that it is a very good thing that DC is finding "Known Duplicates showing up as unique".
(It has certainly clued you in to an issue on your end.)
DickyJean
Posts: 4
Joined: Sun Sep 13, 2020 10:11 pm

Re: Known Duplicates showing up as unique

Post by DickyJean » Sat Sep 19, 2020 5:58 am

Thank you so much.

Egads! I've spent days trying to consolidate libraries from 2 old laptops, one current one and downloads from Google, Amazon and iTunes. I couldn't say which versions I trust, short of ripping 300 CD's again. I certainly can't listen to all 6,500 songs!

It sounds like using the audio mode on DC won't help, because the file may be corrupt after the 2 minute mark. Is there a tool to know which version to trust?

Will I get a cleaner file copy if I bypass the windows copy function and use DC with the Windows Shell unchecked?

Not related: if I do rip the CD's again, is it better to use something like Exact Audio Copy, or is the simple interface of iTunes ok?

I appreciate all of the help you've given. It really is so helpful with my limited knowledge.

I set up a new NAS today since it seems the existing server can't handle what I'm throwing at it.
Post Reply