Video mode -> Hash (Video), what does that do?

The best solution for finding and removing duplicate files.
Post Reply
User avatar
therube
Posts: 615
Joined: Tue Jun 28, 2011 4:38 pm

Video mode -> Hash (Video), what does that do?

Post by therube »

Video mode -> Hash (Video)

I understand:
performs a MD5 hash on V+A / V / A

but just what does that mean/do ?
& how is it different from a Regular mode -> Hash ?

does it extract V+A / V / A
in a "raw" (or some such format) from the video clip
such that any potential tags or header diffs are ignored
(or V or A, in the case of V or A, alone), or ?
User avatar
DigitalVolcano
Site Admin
Posts: 1729
Joined: Thu Jun 09, 2011 10:04 am

Re: Video mode -> Hash (Video), what does that do?

Post by DigitalVolcano »

It's a hash on the raw video or audio data in the file (without headers, etc).
User avatar
therube
Posts: 615
Joined: Tue Jun 28, 2011 4:38 pm

Re: Video mode -> Hash (Video), what does that do?

Post by therube »

Ah, so this: How to md5 the video track (only) in ffmpeg.

Never knew such a thing (option in ffmpeg) existed.
User avatar
therube
Posts: 615
Joined: Tue Jun 28, 2011 4:38 pm

Re: Video mode -> Hash (Video), what does that do?

Post by therube »

Neat. I'll have to play with that.

I'll assume you are using streamhash rather then stream copy - even for V+A (as it is worlds faster).
(And you could always hash the separate V & A hashes that streamhash gives to "generate" a V+A hash.
And while that would be different from the hash that stream copy would give you, that would could still uniquely identify a V+A to compare against others.)
And I'll assume you (always) compute, index, both V & A, even if the user has only specified (to look for) a particular stream (V or A).


(I say these things before actually testing to see how things pan out - in reality.)
User avatar
therube
Posts: 615
Joined: Tue Jun 28, 2011 4:38 pm

Re: Video mode -> Hash (Video), what does that do?

Post by therube »

ffmpeg -i input.mp4 -map 0 -c copy -f streamhash -hash md5 -
vs
ffmpeg -i INPUT -f streamhash out.sha256

So it seems it's the -c copy that makes the difference in speed?
With -c copy, ~25000 frames at a time.
Without it, ~350 frames at a time.

("At a time". At a time I don't know what? Read? Hashed?)


And, I don't know just what that means?
Are we only hashing particular blocks one way, & all another way, or... ?

Hashes are different with -c copy & without.
(I left "md5" / "sha256" as the links were written, but tested in each case using sha256 on the command line.)


(I'll have to read more, when I get some time...)
User avatar
therube
Posts: 615
Joined: Tue Jun 28, 2011 4:38 pm

Re: Video mode -> Hash (Video), what does that do?

Post by therube »

V+A must be using different algorithm, ffmpeg method
which is much slower (then V only or A only)

need it be that way?

Code: Select all

V (or A) is 0:05 (5 seconds)
V+A      is 4:01 (4 minutes)
with V+A, the same algorithm, twice
(separately, not in parallel, at that)

i don't really understand just what that different
hash methods "hash", & if one way is more "correct"
(robust) then the other, but if the are "equal",
you would certainly want to use the 5 second method


"faster":

Code: Select all

timethis ffmpeg -v 0 -i xxx.webm -map 0 -c copy -f streamhash -hash md5 -

crc32 > md5 > sha512 > sha384 > sha256 - xxhash (for comparison)
2.416   2.742 3.768    4.256    4.594    3.941
(Oddly, ffmpeg's default, sha256 is the slowest, even slower then sha256.)

(different computer, other hashes)

Code: Select all

TimeThis :  Command Line :  ffmpeg -v 4 -i xxx.mp4 -map 0 -c copy -f streamhash -hash md5 -
TimeThis :  Elapsed Time :  00:00:03.636
--- 
TimeThis :  Command Line :  ffmpeg -v 4 -i xxx.mp4 -map 0 -c copy -f streamhash -hash murmur3 -
TimeThis :  Elapsed Time :  00:00:02.221
--- 
TimeThis :  Command Line :  ffmpeg -v 4 -i xxx.mp4 -map 0 -c copy -f streamhash -hash ripemd128 -
TimeThis :  Elapsed Time :  00:00:03.822
(160/256/320, all slower)
--- 
TimeThis :  Command Line :  ffmpeg -v 4 -i xxx.mp4 -map 0 -c copy -f streamhash -hash adler32 -
TimeThis :  Elapsed Time :  00:00:02.242
---
1 GB file size, murmur3 / adler32 were 1.sec quicker then md5.
Post Reply