Check duplicates using existing MD5 files

The best solution for finding and removing duplicate files.
User avatar
Burt

Check duplicates using existing MD5 files

Post by Burt »

I have MD5 files for all my archived files.
So when I am seacrching for duplicates, I let DC search for *.mdf files only. It is much faster to check md5 instead of 1Gb files.
It works good, but only if files have the same hash code AND the same filename. In that case the .MD5 files are identical.

But since some files have other different names but the same hash code which is stored in the MD5 fles, DC doesn't mark them as duplicates.

An option to ignore the filename stored in MD5 files would be nice to have. So only the hashcode is compared.

But maybe I have to look for another program which compares MD5 files. But I can't find them.

Nice, fast, and smart program btw!
User avatar
DV

Post by DV »

It would be a good option to have if this is a common problem. I don't know how widely used mdf files are though? What do you use to generate the file?
User avatar
Burt

Post by Burt »

I use Md5Checker for creating the md5 files. As alternative I tried to strip the filename from the md5 files and then DC shows the duplicates really fast.
But it is an extra operation to strip tjhe filename and the md5 file is useless for normal md5 use.
Since I saw DC showing the md5 value after comparing I thought it would be easy to implement such option. But you are right, probaby not many people are using it this way.

maybe I should find a program which automaticly search in md5 files, strips the filename and then save in another file. Then I could use it with DC, and still have the original md5 files for normal use
User avatar
DV

Post by DV »

TextCrawler could probably do that.
http://www.digitalvolcano.co.uk/content/textcrawler

You could get it to strip the line using a reg-ex such as .*\r\n to find and delete the first line (assuming there are only two lines in the file)
User avatar
Burt

Post by Burt »

Thanks for pointing to Textcrawler.
I can't figure out how to do because I have no knowledge about regex.

My md5 files all look like this:
C3F75F29521F749F9C9FC5489544CB04 *filename.ext

So what I have to strip is everything biginning with the space+asterix ( *) to the end.
What command should I use for that?

Thanks again, Burt
User avatar
DV

Post by DV »

Try this regex:

\*.*

Don't forget the leading space. Replace with nothing.

User avatar
Fool4UAnyway

Post by Fool4UAnyway »

If you are only _comparing_ _lists_ of MD5's, wouldn't it be easier to just use a File Compare utility that shows (marks) inline differences, so you can tell by looking at the MD5 column (only) if there are any differences in the files' contents?
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

You might even copy the contents of both lists into two Excel Sheet columns and apply a formula to compare only the first 32 characters of the texts in both columns.
User avatar
Burt

Post by Burt »

@Fool4Uanyway:
I rather do it automated since it are too many files to check by hand.
I also saw a website with instructions on Excel. But it is too much work to import each new file.

@DV
The regex works but if i replace it with nothing, still a strange character remains. This character is added by textcrawler.
Also, is it possible to save the result in a new file automated?
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

Well, I guess you may use Text Crawler to strip anything behind the MD5 from all files and then process these files by Duplicate Cleaner.

I am not sure if WinMerge can do this, but I use ExamDiff Pro (paid software) which allows to ignore parts of lines. So it allows to do a two directory comparison checking only the MD5 parts of all lines in file-pairs. This would result in two side by side filelists showing you which files are equal, or are not. In case of differences you can just open the pair of files and see what lines are different.
Post Reply