Duplicate Cleaner 2.0 new feature list

The best solution for finding and removing duplicate files.
User avatar
DV

Post by DV »

Untick 'protect exe/dll' in your options menu.

See FAQ here:
http://www.digitalvolcano.co.uk/content ... leaner/faq
User avatar
Jay

Post by Jay »

Was hoping 2.0 would enable exclusion of savedwebpage_file\ directories. Fortunately, these are pretty easy to spot at a glance anyway, but preventing them from showing up at all would be great. Fine work in any case!
User avatar
Dr. John

Post by Dr. John »

I've just downloaded and tried Duplicate Cleaner (but I'm not sure which version). I did an initial test with a small network drive (about 50GB) and compared it with my favorite duplicate file finder (Advanced Space Hunter). ASH does the same job on this drive in half the time. But, ASH has limitations, mainly the number of files that it can handle at a time. So, I'd like to replace it.

My guess is that it does a couple of things that Duplicate Cleaner doesn't. One is to sort the files by size and eliminate comparisons of files of unequal sizes (i.e., there is no reason to read and create MD5's for those files). The second is to abandon a comparison of two files on a byte by byte basis (or choose your comparison record length) once it finds that two files aren't equal. This seems to be much faster than calculating the MD5's for two files (especially when the are large) and then comparing them.

If you could incorporate these approaches for the match by content scans, this software would become the new norm for me.


User avatar
Tom

Post by Tom »

Dr John: ASH vs DC
ASH was a great program...but when I migrated from Win95 to XP I found that it no longer displayed a full window nor was it resizable. It had another minor problem I can't recall at the moment. But the duplicate finding still worked well and I used it a lot with XP.

The biggest difference I find in favor of Duplicate Cleaner is that ASH insisted on scanning ALL of the drive first to create a database (I guess)of the files on that drive and then you got to select the folders you wanted to search in for dupes. The duplicate search might have been faster than DC's but that was after waiting for the total drive scan.

I very much like that DC allows you to restrict the search from the beginning to only the folders in which you want to search...and 90% of my duplicate searching is in a block of known folders. Haven't tried a total drive scan as I haven't needed to do that. While DC can be slowed if some other program such as virus scanner is running in the background, I would suspect that the total scan time (at least in my circumstances) would be less with DC than with ASH since you have to scan twice.

Too bad ASH's author has not updated the program in years so as to be more compatible with versions after Win95. (I have no experience with either Win98 or ME so don't know how it worked with them.) Currently using XP on an older desktop that gets minimal usage and Win7 x64 SP1 on a laptop and having no problems with DC at all.
User avatar
Tom

Post by Tom »

Dr John: additional comments on ASH vs DC.

I am not conversant with the techniques used in file comparisons so I am unfamiliar with the differences among byte-by-byte, MD5 SHA1 and SHA256 that DC uses...I did use all 4 methods on a recent search of 300 folders with a total of 31,000 files finding 5 sets of duplicates and found no significant time difference...but that's probably too small a number of dupes to show a difference.

Also, DC does allow you to sort the duplicates by size.
User avatar
Dr John

Post by Dr John »

Tom,

Thanks for your comments. I'm not sure what techniques ASH uses versus those used by Duplicate Cleaner. I only wanted to make the observation that, in my experience, ASH was faster (by a factor of 2) in finding duplicates on network drives. This suggests that there is room for improvement (Isn't there always?) in Duplicate Cleaner.

In my case, I perform automatic backups every night to a backup server (about 500GB) that also logs various versions of files. If I move a file from one location to another (actually, usually a folder) on the main drive, then I'll get duplicates on the backup server. Unfortunately, the number of files on this backup server is greater than what can be conveniently be managed by ASH. And, the server is slow (meaning that it takes a long time to access files). So, duplicate file finding software that introduces unnecessary activity (my hypothesis for Duplicate Cleaner), makes this process one that can take excessively long (e.g., more than the time between backup events). ASH seems to be very fast compared to anything else that I've tried, but can't handle the number of files that can be stored in 500GB. While the backup process doesn't seem to be limited by the slow speed of this server, 500GB isn't really enough to just let duplicates accumulate without end, so I do need to organize it from time to time. I'm hopeful that Duplicate Cleaner will eventually enable me to do so conveniently.

BTW, ASH always sorted the duplicates by size, as well as searching for them by size (largest first) so it was very productive in terms of finding the most amount of disk space with the least amount of time, especially valuable when you needed to interrupt a scan for duplicates.

In case it isn't obvious, I'm not really trying to advocate for ASH (which is long gone), but trying to make suggestions for improving Duplicate Cleaner based upon my previous experience. I hope that it helps.

User avatar
dcwul

Post by dcwul »

"Auto-tag" shortest folderpath.
example:
[X] file xyz is in folder x:\wks
[_] file xyz is in folder x:\wks\1st-subfolder\ThisSubject

Files in a 'specialized' folder, together with other files about the same subject, often should be kept, whereas a heap of duplicates in 1 rootfolder, they got there by accident.

=

Post Reply