Is it meant to be so slow? I thought it was finished at 28 mins then it restarted doing this hash calculation thing. Really not sure whats going on. It's completed 20% of about 935GB in approx 1hour 45mins. Is that usual?
Can anyone advise please?
finding duplicate cleaner to be very slow.
Re: finding duplicate cleaner to be very slow.
How about more information?
At the least, the log file from your last scan.
At the least, the log file from your last scan.
- DigitalVolcano
- Site Admin
- Posts: 1864
- Joined: Thu Jun 09, 2011 10:04 am
Re: finding duplicate cleaner to be very slow.
It often does do a second pass of hash calulcation - this is normal. 1TB of files can take a while depending on how many same size files there are (and how large). External and networks drives can also be slower due to reading speed.
-
- Posts: 2
- Joined: Sun Jun 23, 2024 1:20 pm
Re: finding duplicate cleaner to be very slow.
Been using Duplicate Cleaner for nearly as long as this thread has been around! (since 2014) EXCELLENT tool!
Yesterday, I upgraded to version 5.22.0. The last version I used was 5.17.0.2 Pro Edition and is showing as registered in the "Home" tab. Started a scan on ~1 TB of data. Looks like half of the files already had MD5 hashes in a cache leaving the other half. After almost a DAY it's just a little over 5% done (10% of the uncalculated half). At this rate it will take over 10 days to complete.
This is on a workstation laptop with the fastest SSDs available. I've done nearly identical scans many times before and it took nowhere near this long to finish even when one of the drives was an external attached SSD. This scan is of my C:\Users\<username>\Desktop folder which as mentioned is on a SSD drive with 1.8 TBs of space (28% free).
Is this a problem with the new version? Am I somehow just not remembering things properly? I don't see how the latter would be correct when I've always been amazed at how fast the scan was. Currently in regular mode, each file is taking about 2 secs and the scan speed only around 2 to 5 KB/s (the drive is capable of over 3,000GB/s sequential reads and 60MB/s random reads). Looking at the resource monitor, it looks like Duplicate Cleaner is CPU bound. There are 8 physical cores in the laptop; each with hyperthreading. Duplicate Cleaner is running 28 threads. It is consuming 6% of the overall CPU resources available. All other processes are consuming less than 1%. I don't see from resource monitor what threads are on which "processing" thread. That said, core 3 thread 2 is running at 100% nearly 90-95% of the time (pretty much maxed out) and core 1 thread 1 is running at about 50% 100% of the time. Given the utilization of the other processes running, these would appear to all be due to Duplicate Cleaner.
I may try reverting back to an older version of Duplicate Cleaner. However, I would like to know if this is expected behavior before I spend too much more time looking into it as an issue.
Thanks!
Other details:
Regular mode
All file extensions
All file size
Any date
Any dimensions
Match full folder name
Ignore "Copy" part of filename
585,943 files scanned in 44,951 folders (963 GB)
Calculating hashes (MD5 Partial) <4:16:55:00 @ 5.37 KB/s; 33,093 / 299,150 [This is at 22:30:00 hours]
Fetching hashes from cache (MD5) 229,611
VMDK 226 GB
DMP 91.4 GB
MP4 90.7 GB
JPG 63.1 GB
VDI 55.5 GB
HEIC 49.1 GB
MOV 36.9 GB
OVA 32.8 GB
003 31.1 GB
SAV 19.9 GB
ISO 18.4 GB
GZ 17.2 GB
002 16.2 GB
005 16.2 GB
006 15.7 GB
Other 192 GB
Yesterday, I upgraded to version 5.22.0. The last version I used was 5.17.0.2 Pro Edition and is showing as registered in the "Home" tab. Started a scan on ~1 TB of data. Looks like half of the files already had MD5 hashes in a cache leaving the other half. After almost a DAY it's just a little over 5% done (10% of the uncalculated half). At this rate it will take over 10 days to complete.
This is on a workstation laptop with the fastest SSDs available. I've done nearly identical scans many times before and it took nowhere near this long to finish even when one of the drives was an external attached SSD. This scan is of my C:\Users\<username>\Desktop folder which as mentioned is on a SSD drive with 1.8 TBs of space (28% free).
Is this a problem with the new version? Am I somehow just not remembering things properly? I don't see how the latter would be correct when I've always been amazed at how fast the scan was. Currently in regular mode, each file is taking about 2 secs and the scan speed only around 2 to 5 KB/s (the drive is capable of over 3,000GB/s sequential reads and 60MB/s random reads). Looking at the resource monitor, it looks like Duplicate Cleaner is CPU bound. There are 8 physical cores in the laptop; each with hyperthreading. Duplicate Cleaner is running 28 threads. It is consuming 6% of the overall CPU resources available. All other processes are consuming less than 1%. I don't see from resource monitor what threads are on which "processing" thread. That said, core 3 thread 2 is running at 100% nearly 90-95% of the time (pretty much maxed out) and core 1 thread 1 is running at about 50% 100% of the time. Given the utilization of the other processes running, these would appear to all be due to Duplicate Cleaner.
I may try reverting back to an older version of Duplicate Cleaner. However, I would like to know if this is expected behavior before I spend too much more time looking into it as an issue.
Thanks!
Other details:
Regular mode
All file extensions
All file size
Any date
Any dimensions
Match full folder name
Ignore "Copy" part of filename
585,943 files scanned in 44,951 folders (963 GB)
Calculating hashes (MD5 Partial) <4:16:55:00 @ 5.37 KB/s; 33,093 / 299,150 [This is at 22:30:00 hours]
Fetching hashes from cache (MD5) 229,611
VMDK 226 GB
DMP 91.4 GB
MP4 90.7 GB
JPG 63.1 GB
VDI 55.5 GB
HEIC 49.1 GB
MOV 36.9 GB
OVA 32.8 GB
003 31.1 GB
SAV 19.9 GB
ISO 18.4 GB
GZ 17.2 GB
002 16.2 GB
005 16.2 GB
006 15.7 GB
Other 192 GB
-
- Posts: 2
- Joined: Sun Jun 23, 2024 1:20 pm
Re: finding duplicate cleaner to be very slow.
Switching to version 5.16 at first was much faster ***BUT*** this looks like it was because I didn't have search in .zip files enabled. I went back to version 5.22.0 and turned this off which made a massive difference.
The file report mentions 23.0 GB being of type ZIP. Oddly, the number of files reported is double ~500K (with scan inside .zip enabled) vs ~250K (with this disabled). Still it seems odd that the processing time would be more than 120x... but I have no experience here.
Thoughts?
The file report mentions 23.0 GB being of type ZIP. Oddly, the number of files reported is double ~500K (with scan inside .zip enabled) vs ~250K (with this disabled). Still it seems odd that the processing time would be more than 120x... but I have no experience here.
Thoughts?
- DigitalVolcano
- Site Admin
- Posts: 1864
- Joined: Thu Jun 09, 2011 10:04 am
Re: finding duplicate cleaner to be very slow.
Scanning inside zip/archive files has a massive impact on the scan speed, as it has to unzip each file to scan it (assuming you are doing a content scan). That's why it's off by default. Once the hashes are cached for the zip things will be faster.