much faster search option

The best solution for finding and removing duplicate files.
User avatar
Dan

much faster search option

Post by Dan »

The freeware program Duplicate File Finder 3.5 has the option of "Fast Search (less accurate)" where the program scans only the first and last 10 MB of each file. Very useful for comparing large video files, saves tons and tons of time, especially when those files are stored on external USB disks who's read speed it painfully slow. With this option added, your Duplicate Cleaner will become unbeatable!

Also, is there a way to Search inside the list of results? When we have 1000 dups and we need to find just one file, can we?
User avatar
therube

Post by therube »

I believe on of the posts here said he does something like that already?

I'm working on some interesting results, but I won't have everything together till tomorrow sometime...
User avatar
therube

Post by therube »

Appears average file /size/ makes a big difference.


With MP3's, so say an average file size of 5 MB,
DFF Fast == 24 min vs. 206 Byte-to-Byte == 12 min.
So half the time.
DFF non-Fast was worse yet == 35 min.


With Video Clips, much larger files in general,
much smaller total number of files,
DFF Fast == 8.8 sec vs. 206 Byte-to-Byte == 40 sec.
So much quicker.
DFF non-Fast == 41.71 sec, so that method is same as 206.


Both programs returned the same results relative to each other, & regardless of the search methods used.


So this tells me that DFF's Fast method /can/ be very effective when scanning the "right" types of files. It completely fails on my MP3 test. /Guessing/ that it is not the method used, but more shear number of (MP3) items scanned where it is loosing efficiency?


Program interface between the two is day & night, 206 wins.

206 reports the total scan time on its summary screen, & it /is/ accurate. For DFF I had to use a timer.

206 uses more memory, say 50 - 75 MB, vs. 25 MB for DFF. CPU usage is more varied, jumping from a nominal number to about 10% (generally with DFF) & then varying more, peaking around 25% (25 being 1 of 4 cores in a quad) perhaps more so with 206 & depending on the method used & file types scanned too.

Regardless of MEM or CPU used, I didn't consider it a detriment or advantage to either program, not particularly noticing any impact on system performance. And at times, even though DFF showed lower numbers, it seemed as if it had a greater affect on the overall system. (That is not a contradiction.)


(I'll try to get more results up later yet.)
User avatar
DV

Post by DV »

Very interesting results, thanks for posting them.

In byte-by-byte mode, DC scans the two files against each other, and aborts the scan at the first byte difference.

In 'Hashed' mode, DC generally hashes a little bit of the file, to enabled quick difference detection, then does a full hash if the hashes match.
User avatar
therube

Post by therube »

More to chew on ...

http://pastebin.com/07QbR3af
Post Reply