Slowness when 'Same Content' is not used.

Colombo · Post by **Colombo** » Fri Sep 03, 2010 3:03 pm

Dear Duplicate Cleaner developer!

I would like to know why a given operation is so slow. I selected some folders and informed that duplicates are files with same name, same size and same date. Duplicate Cleaner scans all folders and files in less than 20s. It found about 9,500 folders and about 400,000 files. Then it takes more than 20 minutes to show duplicates, consuming 100% of CPU along this period!!!

I don't understand why this has to be so slow and CPU consuming, as there is no need to read file contents. I made a small program that works as follows:
1. Create a list where each entry is a record for a found file.
2. Breaks each record into the following fields: 'name without path', date, size, 'path without name', group. The group is initially zero.
3. Sorts by the list by 'name without path', date, size, 'path without name'.
4. Iterates over the list to assign groups. The first group is 1, and each time the next record has a different value for the key ('name without path', date, size), the group number is incremented.

Doing so, I can find duplicates in less than 4s!! It takes more time to scan folders than to find duplicates!! Am I missing some important thing?

DV · Post by DV » Sun Sep 05, 2010 4:18 pm

DC 1.4 scans quickly but is slow at populating the list at the end due to limitations of the ListView control used. It is also doing other operations for the files such as getting filetype info/attributes and counting hardlinks.
The next gen version is much faster at populating, so shouldn't have this problem!

Colombo · Post by **Colombo** » Sun Sep 05, 2010 9:18 pm

All is explained. I'm looking forward to 'next gen' version. Thanks a lot.