DigitalVolcano Software Support

Posted: **Sun May 29, 2011 11:59 pm**

Can anyone explain the differences between byte-to-byte, MD5, SHA-1, Sha-256.

I'm assuming byte-to-byte is the most accurate but slowest but not sure, if anyone can sumarise them would be much appreciated as there is not a lot of detail in the manual.

Thanks

Posted: **Mon May 30, 2011 9:58 pm**

You might consult wikipedia to find out how (exactly) these algorithms work.

Posted: **Tue May 31, 2011 1:06 am**

Thanks for the reply but as I said in the original post I was really hoping for a summary. I had allready googled ... the descriptions and as you said it comes back with reems of in-depth tech details.

What I was looking for was more on the lines of if different methods were quicker, slower, more accurate or more suited to different file types than others.

Posted: **Tue May 31, 2011 3:49 pm**

Most accurate is byte-to-byte.

It will compare the byte at pos(1) to pos(1) in the second file. Increment (1) till you reach the end of the file. If at the end, all bytes compared, you have a duplicate.

The others compute a hash value of the files. The /possibility/ of collisions exist. Meaning that files could be tagged as duplicates where in fact they are not. MD5 is known to be broken in that respect. SHA-1 & then SHA-256 far less likely.

That said, for most, using MD-5 would be more then sufficient.

You would assume that a more complex algorithm would take longer to compute, but that may not necessarily be the case?

http://www.mscs.dal.ca/~selinger/md5collision/

Posted: **Wed Jun 01, 2011 3:44 am**

Thanks for the input. What is the "fastest" way to search here? MD5?

Posted: **Wed Jun 01, 2011 8:48 am**

Thanks therube, I'm running scans on a terstation with a lot of data so they take a while and as such I've started running the scans overnight.

I think I'l stick with byte-to-byte as I'd prefer accuracy over time.

Cheers

DigitalVolcano Software Support

Difference Between Content Types?

Difference Between Content Types?