Performance improvement

The best solution for finding and removing duplicate files.
Post Reply
User avatar
St�phane BARIZIEN

Performance improvement

Post by St�phane BARIZIEN »

Process Monitor reveals you are reading files 4KB at a time.

Reading with larger e.g. 64KB or even 1MB buffers would probably increase performance significantly for large files.

Just my �0.02
User avatar
DV

Post by DV »

Thanks, will experiment with this.
Myth
Posts: 4
Joined: Tue Jul 05, 2011 7:49 pm

Re: Performance improvement

Post by Myth »

It's likely that Duplicate Cleaner already employs the following optimisation, but here goes anyway:-

The idea is to reduce the number of files checked for time consuming criteria by eliminating obvious mismatches through less intensive criteria first - typically using multiple passes, with the most intensive searches occuring in the final pass.

Pass 1: Compare for search criteria that can be obtained from the master file table (name, size, attributes etc). Size is particularly handy as two files with the same content are going to be the same size, so it follows that a mismatch in file sizes means there cannot be a match in content. Pass 1 should result in a preliminary list of matching file groups.

Pass 2: Step through pass 1's list of file groups, checking each group for more intensive criteria (such as content matching).
Post Reply