Known Duplicates showing up as unique

The best solution for finding and removing duplicate files.
Post Reply
DickyJean
Posts: 5
Joined: Sun Sep 13, 2020 10:11 pm

Known Duplicates showing up as unique

Post by DickyJean »

I'm getting files listed as unique when I'm certain there are duplicates. I have already gone through the troubleshooting at https://digvolsoft.freshdesk.com/suppor ... 6000029796.

I had typed this long post with screenshots, then when I went to submit it I learned I could only use three URLs. I put the whole story with screenshots on a rudimentary web page. Please don't laugh at it, I'm not a developer and I don't even dabble in code or HTML or whatever.

The web page with all of the details is here: https://sites.google.com/view/digitalvolcanohelp

Short version:
I ran the tool using search criteria: same content, all file types; scan location Yes to both scan against and Yes uniques on the smaller data set that I want to rid of uniques. I got a huge number of uniques, and no duplicates. However, I know that there are duplicates.

I ran it again with scan against off, getting the same results.

I ran a subset of the data to confirm that I know there are duplicates. There definitely are: Image

With the smaller data set, I switched to audio mode. I got more duplicates listed, but I still had false uniques.

I'm desperate and don't know what else to do. Please help me!
User avatar
DigitalVolcano
Site Admin
Posts: 1717
Joined: Thu Jun 09, 2011 10:04 am

Re: Known Duplicates showing up as unique

Post by DigitalVolcano »

It's possibly because it's using comparison mode.

"If a drive/folder is set to 'Don't scan against self' and 'find uniques' then any files that aren't duplicated on other drives are listed in the unique tab. This behavior is very useful for drive comparison."

See large thread here-
viewtopic.php?f=4&t=1882
User avatar
therube
Posts: 614
Joined: Tue Jun 28, 2011 4:38 pm

Re: Known Duplicates showing up as unique

Post by therube »

Known Duplicates showing up as unique
Taking just two of your known duplicates:
(02 Rock n' Roll Singer.m4a & 06 Overdose.m4a should work)
- are the file sizes the same?
- do the files generate the same hash (MD5, SHA1 or whatever)?

If they aren't the same size, they are not "same content" duplicates.
If they are the same size but do not generate the same hash, they are not "same content" duplicates.

HashMyFiles - Calculate MD5/SHA1/CRC32 hashes of your files
DickyJean
Posts: 5
Joined: Sun Sep 13, 2020 10:11 pm

Re: Known Duplicates showing up as unique

Post by DickyJean »

I checked and the files are the same size. I downloaded the hash tool, and it looks like they are calculating different hashes.

Image

Why would that be? One was copy and pasted to be the other. Is there a configuration of duplicate pro that would pick these up as duplicates?

grr, I'm doing something wrong and the image won't show https://drive.google.com/file/d/1qitLNm ... sp=sharing
User avatar
therube
Posts: 614
Joined: Tue Jun 28, 2011 4:38 pm

Re: Known Duplicates showing up as unique

Post by therube »

As the hashes differ, they are certainly different.

Maybe the tags are different?
If you played the song, some media players (may, can, will) update tags?
(No idea what Win10 itself may or may not do.)

Maybe corruption, like a bad spot on a drive?
(Probably not, but.)


Might throw the two files at, WinMerge.
Might point something out.
If it's only a "single line" difference, that's probably a tag.
If there are extensive differences, might bear more investigating.


(Maybe host the two files on Google, [as you did your screenshot].
[I wouldn't be able to get to them till sometime tomorrow, maybe.])
DickyJean
Posts: 5
Joined: Sun Sep 13, 2020 10:11 pm

Re: Known Duplicates showing up as unique

Post by DickyJean »

therube wrote: Wed Sep 16, 2020 9:34 pm Might throw the two files at, WinMerge.

(Maybe host the two files on Google, [as you did your screenshot].
[I wouldn't be able to get to them till sometime tomorrow, maybe.])
Thank you for being so patient with me. I tried WinMerge, but it was all gibberish to me. I'm way out of my depth here.

I was able to upload the files:
Server Version: https://drive.google.com/file/d/1eY1jus ... sp=sharing
Local Version: https://drive.google.com/file/d/1y0T7Ma ... sp=sharing

That part I can figure out :)
User avatar
therube
Posts: 614
Joined: Tue Jun 28, 2011 4:38 pm

Re: Known Duplicates showing up as unique

Post by therube »

Without hearing (no speakers on the system I'm on), but by comparing, & viewing the files, & by "playing" them, I'd say server is corrupt.

It has a LARGE section of nulls, starting about 1/3 of the way through the file.
If I "play" it, its' "timeline" gets to ~1:15 mark then fast-forwards to the end (i.e., it is only playing up to the 1:15 mark).

The file on the server is corrupt.
The $00$ are "null" bytes, & continue from where first displayed, all the way to the end of the file.

Image


That begs the question, what happened?
Could very well have a bad drive, bad connection... ?
But almost assuredly one copy is good, & one is bad.


Given the above, this is expected:

Code: Select all

ffmpeg  -i "%~1"       -v error -f null -   2>&1

Code: Select all

[aac @ 035df3e0] decode_band_types: Input buffer exhausted before END element found
Error while decoding stream #0:0: Invalid data found when processing input
[aac @ 035df3e0] channel element 0.0 is not allocated
Error while decoding stream #0:0: Invalid data found when processing input
[aac @ 035df3e0] channel element 0.0 is not allocated
Error while decoding stream #0:0: Invalid data found when processing input
[aac @ 035df3e0] channel element 0.0 is not allocated
...

I'd say that it is a very good thing that DC is finding "Known Duplicates showing up as unique".
(It has certainly clued you in to an issue on your end.)
DickyJean
Posts: 5
Joined: Sun Sep 13, 2020 10:11 pm

Re: Known Duplicates showing up as unique

Post by DickyJean »

Thank you so much.

Egads! I've spent days trying to consolidate libraries from 2 old laptops, one current one and downloads from Google, Amazon and iTunes. I couldn't say which versions I trust, short of ripping 300 CD's again. I certainly can't listen to all 6,500 songs!

It sounds like using the audio mode on DC won't help, because the file may be corrupt after the 2 minute mark. Is there a tool to know which version to trust?

Will I get a cleaner file copy if I bypass the windows copy function and use DC with the Windows Shell unchecked?

Not related: if I do rip the CD's again, is it better to use something like Exact Audio Copy, or is the simple interface of iTunes ok?

I appreciate all of the help you've given. It really is so helpful with my limited knowledge.

I set up a new NAS today since it seems the existing server can't handle what I'm throwing at it.
User avatar
therube
Posts: 614
Joined: Tue Jun 28, 2011 4:38 pm

Re: Known Duplicates showing up as unique

Post by therube »

Oh, a start (slight mods based on code from superuser.com):

checkaudio.bat:

Code: Select all

@echo off

set "filtro=%1"
if [%filtro%]==[] (
    set "filtro=*.mp3 *.m4a *.mp4"
    )

for /R %%a in (%filtro%) do call :doWork "%%a"

    PAUSE
    exit /B

:doWork
    echo Processing: %1
    ffmpeg.exe -v error -i %1 -f null - > "%~1.log" 2>&1
Would search mp3, m4a, mp4 in the directory you run the batch file from, outputting each files name & logging any errors found in that filename.log.

You would need ffmpeg.exe. And you would need to point to its location in the above batch file.
(zeranoe.com was the go to place for it, but he just shut his site down.
There's a version here, FFmpeg-x86 git N-98032-g6e1903938b.)

Code: Select all

09/21/2020  01:44 PM                 0 02 Rock 'n' Roll Singer Local.m4a.log
09/21/2020  01:45 PM         1,283,830 02 Rock 'n' Roll Singer Server.m4a.log
09/21/2020  01:23 PM         1,288,156 err.log
09/21/2020  01:44 PM         1,283,830 xxx.mp3.log
Local shows an empty file, 0 bytes, i.e., no errors were detected.
The others show something, so have to be considered suspect.

Also, any set of files that you know are duplicates, but which Duplicate Cleaner flagged as "unique" are also going to be suspect.

So if you're able to segregate those sets out from all the rest, it would be much quicker to check only those particular files rather then all files. (You'd still ? want to check all files, just to know.)

Also, would be interesting to find out where the issue lies? As in is the drive in the (old) NAS failing or have bad spots? A S.M.A.R.T. test might show that. Or might it be a connection issue; like you're connected to a USB3 port & the particular port is just screwy, where a USB2 port might be trouble-free - or a rear port vs. a front port (on your case)...

(A more automated, GUI-base way, not sure?
Foobar2000's File Integrity Verifier was mentioned, but I'm not familiar with it or how it works.)
Post Reply