Difference Between Content Types?

The best solution for finding and removing duplicate files.
Post Reply
User avatar
Bruce

Difference Between Content Types?

Post by Bruce »

Can anyone explain the differences between byte-to-byte, MD5, SHA-1, Sha-256.

I'm assuming byte-to-byte is the most accurate but slowest but not sure, if anyone can sumarise them would be much appreciated as there is not a lot of detail in the manual.

Thanks
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

You might consult wikipedia to find out how (exactly) these algorithms work.
User avatar
Bruce

Post by Bruce »

Thanks for the reply but as I said in the original post I was really hoping for a summary. I had allready googled ... the descriptions and as you said it comes back with reems of in-depth tech details.

What I was looking for was more on the lines of if different methods were quicker, slower, more accurate or more suited to different file types than others.
User avatar
therube

Post by therube »

Most accurate is byte-to-byte.

It will compare the byte at pos(1) to pos(1) in the second file. Increment (1) till you reach the end of the file. If at the end, all bytes compared, you have a duplicate.

The others compute a hash value of the files. The /possibility/ of collisions exist. Meaning that files could be tagged as duplicates where in fact they are not. MD5 is known to be broken in that respect. SHA-1 & then SHA-256 far less likely.

That said, for most, using MD-5 would be more then sufficient.

You would assume that a more complex algorithm would take longer to compute, but that may not necessarily be the case?

http://www.mscs.dal.ca/~selinger/md5collision/
User avatar
Michael

Post by Michael »

Thanks for the input. What is the "fastest" way to search here? MD5?
User avatar
Bruce

Post by Bruce »

Thanks therube, I'm running scans on a terstation with a lot of data so they take a while and as such I've started running the scans overnight.

I think I'l stick with byte-to-byte as I'd prefer accuracy over time.

Cheers
Post Reply