Page 1 of 1

Similar Content Percentage question

Posted: Wed Aug 28, 2019 12:32 am
by bobsage
I just want to clarify how this option works before I do a large scan on my PC with it.

Say I have two of the same videos. 1 is 1,798,317,337 bytes and the other is 1,796,629,143 bytes.

Both are esentially the same video, with maybe 1 second added to one of them. Hence the size difference. If I have similar content set to 99%, would this qualify as a match?

I'm not sure if this match goes by byte similarity, actual content (aka checks the videos themselves are 99% similar) or what.

Please clarify if possible.

Re: Similar Content Percentage question

Posted: Wed Aug 28, 2019 4:44 pm
by DigitalVolcano
It is byte similarity. Whether you get a match at 99% will depending on the encoding similarity, headers, metadata, etc.

Re: Similar Content Percentage question

Posted: Thu Aug 29, 2019 3:00 am
by therube
To note...


I've got some videos that when I muxed the audio & video, I'd experiment with an offset, say 100ms delay in the audio.
And while output file sizes may be exact, & while the video portion & the audio portions are exact, because of the delay, the files are different - substantially to a file comparison program.

So for something like that, DC, Similar Content, will not find them as "duplicates", though they "are".


In other instances, I encoded the same content, only using different versions of ffmpeg.
ffmpeg writes its' version number used to encode the file into the output file.
So while I may have used the same encoder options, & while the output files end up being the same size (along with the content), the files themselves are different, because of the ffmpeg "header" (if you will) version number.

In this case, DC, Similar Content, should find the files to be "duplicates" - because they are, essentially.