Page 1 of 2
Check duplicates using existing MD5 files
Posted: Thu Sep 02, 2010 10:07 pm
by Burt
I have MD5 files for all my archived files.
So when I am seacrching for duplicates, I let DC search for *.mdf files only. It is much faster to check md5 instead of 1Gb files.
It works good, but only if files have the same hash code AND the same filename. In that case the .MD5 files are identical.
But since some files have other different names but the same hash code which is stored in the MD5 fles, DC doesn't mark them as duplicates.
An option to ignore the filename stored in MD5 files would be nice to have. So only the hashcode is compared.
But maybe I have to look for another program which compares MD5 files. But I can't find them.
Nice, fast, and smart program btw!
Posted: Sun Sep 05, 2010 4:22 pm
by DV
It would be a good option to have if this is a common problem. I don't know how widely used mdf files are though? What do you use to generate the file?
Posted: Wed Sep 08, 2010 9:25 am
by Burt
I use Md5Checker for creating the md5 files. As alternative I tried to strip the filename from the md5 files and then DC shows the duplicates really fast.
But it is an extra operation to strip tjhe filename and the md5 file is useless for normal md5 use.
Since I saw DC showing the md5 value after comparing I thought it would be easy to implement such option. But you are right, probaby not many people are using it this way.
maybe I should find a program which automaticly search in md5 files, strips the filename and then save in another file. Then I could use it with DC, and still have the original md5 files for normal use
Posted: Wed Sep 08, 2010 11:39 am
by DV
TextCrawler could probably do that.
http://www.digitalvolcano.co.uk/content/textcrawler
You could get it to strip the line using a reg-ex such as .*\r\n to find and delete the first line (assuming there are only two lines in the file)
Posted: Wed Sep 15, 2010 9:00 am
by Burt
Thanks for pointing to Textcrawler.
I can't figure out how to do because I have no knowledge about regex.
My md5 files all look like this:
C3F75F29521F749F9C9FC5489544CB04 *filename.ext
So what I have to strip is everything biginning with the space+asterix ( *) to the end.
What command should I use for that?
Thanks again, Burt
Posted: Wed Sep 15, 2010 1:23 pm
by DV
Try this regex:
\*.*
Don't forget the leading space. Replace with nothing.
Posted: Wed Sep 15, 2010 9:20 pm
by Fool4UAnyway
If you are only _comparing_ _lists_ of MD5's, wouldn't it be easier to just use a File Compare utility that shows (marks) inline differences, so you can tell by looking at the MD5 column (only) if there are any differences in the files' contents?
Posted: Wed Sep 15, 2010 9:28 pm
by Fool4UAnyway
You might even copy the contents of both lists into two Excel Sheet columns and apply a formula to compare only the first 32 characters of the texts in both columns.
Posted: Fri Sep 17, 2010 2:55 pm
by Burt
@Fool4Uanyway:
I rather do it automated since it are too many files to check by hand.
I also saw a website with instructions on Excel. But it is too much work to import each new file.
@DV
The regex works but if i replace it with nothing, still a strange character remains. This character is added by textcrawler.
Also, is it possible to save the result in a new file automated?
Posted: Fri Sep 17, 2010 8:43 pm
by Fool4UAnyway
Well, I guess you may use Text Crawler to strip anything behind the MD5 from all files and then process these files by Duplicate Cleaner.
I am not sure if WinMerge can do this, but I use ExamDiff Pro (paid software) which allows to ignore parts of lines. So it allows to do a two directory comparison checking only the MD5 parts of all lines in file-pairs. This would result in two side by side filelists showing you which files are equal, or are not. In case of differences you can just open the pair of files and see what lines are different.