Perhaps it's time to support multithreading scanning.
Perhaps it's time to support multithreading scanning.
With SSDs being very common now, disk speed is often no longer the bottleneck. Can you please consider incorporating support for multithreading scanning in the upcoming versions?
Re: Perhaps it's time to support multithreading scanning.
Yes please!
Discovering Files: One drive at a time.
Calculating Hashes (MD5 Partial): All drives at once
Calculating Hashes (MD5): All drives at once
Why is the initial directory scan (discovering files) one drive at a time? This appears to be one place to significantly speed up the program.
When I was testing DC on two internal nvme drives, everything went great. Now that I am running it with 25TB on 14 drives re-discovery (discovering Files alone, before matching duplicates starts) takes 1.5 hours. So, once this next scan is done, I need to figure out how to use DC efficiently.
Discovering Files: One drive at a time.
Calculating Hashes (MD5 Partial): All drives at once
Calculating Hashes (MD5): All drives at once
Why is the initial directory scan (discovering files) one drive at a time? This appears to be one place to significantly speed up the program.
When I was testing DC on two internal nvme drives, everything went great. Now that I am running it with 25TB on 14 drives re-discovery (discovering Files alone, before matching duplicates starts) takes 1.5 hours. So, once this next scan is done, I need to figure out how to use DC efficiently.
- DigitalVolcano
- Site Admin
- Posts: 1863
- Joined: Thu Jun 09, 2011 10:04 am
Re: Perhaps it's time to support multithreading scanning.
Looking at multithreading for the file tree building for version 6. With the other scan modes it'll depend on the scan type as some of the third party libraries may not multithread well.
Re: Perhaps it's time to support multithreading scanning.
Absolutely yes please!
I used v4 since 2018 and today I bought v5 and yes, I'm somewhat disappointed regarding the basic technical level: For example: No multithreading when calculating "MD5 Partial"s results in only ~4MB/s calc speed with my 8 core CPU and a fast NVMe SSD...
In addition, we are almost in 2024 and it's still a 32bit software - which might not be an actual limitation.
Aside of that, the software itself is great - otherwise I wouldn't have bought it a 2nd time
I used v4 since 2018 and today I bought v5 and yes, I'm somewhat disappointed regarding the basic technical level: For example: No multithreading when calculating "MD5 Partial"s results in only ~4MB/s calc speed with my 8 core CPU and a fast NVMe SSD...
In addition, we are almost in 2024 and it's still a 32bit software - which might not be an actual limitation.
Aside of that, the software itself is great - otherwise I wouldn't have bought it a 2nd time

Last edited by uwek on Sat Dec 16, 2023 5:21 pm, edited 1 time in total.
-
- Posts: 3
- Joined: Tue Aug 29, 2023 11:28 am
Re: Perhaps it's time to support multithreading scanning.
Agreed, please modify to use multiprocessing (threading/process choice depends on many factors) at file tree building phase.
I am not 100% sure it uses multiprocessing when scanning two sources even at hash time, because I only see one counter increase at any time. The counter that increments switches periodically between the two sources. But the I/O shows parallel reading, so not sure... can someone/DV confirm?
I am not 100% sure it uses multiprocessing when scanning two sources even at hash time, because I only see one counter increase at any time. The counter that increments switches periodically between the two sources. But the I/O shows parallel reading, so not sure... can someone/DV confirm?
Re: Perhaps it's time to support multithreading scanning.
(AFAIK) it's "32", & it is also "64".we are almost in 2024 and it's still a 32bit software
The "Duplicate Cleaner 5.exe" .exe itself shows as x86.
(And I'm not quite sure how it works, but...)
When it runs, depending on OS bit-ness, it will run as either x86 or x64.
(You will also see both x86 & x64 .dll's in the programs instalDir.)
And as far as multithreading & such, you need to be careful that it is done "right".
As there can be differences between theoretical efficiencies & what one actually "gets" (as in may factors can affect outcomes, more then "just" making something "multithreaded").
Re: Perhaps it's time to support multithreading scanning.
I am fairly confident that even if sometimes multithreading can be less good in practice than what someone might think there is still a ton of speed to claim. Some tasks in general are pretty slow, like the initial file discovery, the metadata reading for pictures and the whole picture pipeline is pretty slow. Right now I'm doing duplicate checking for pictures and it does like 1 picture each second with the cpu sitting at 7% and every disk and the network basically idle. With other software it takes a lot less time even to compare videos which should be much heavier but I know multi threading is not something you do in a day and you have to be very careful. Maybe using windows indexing or something like everything search may allow for some improvement for the file discovery... but I don't know, even some open source programs do it faster and I don't think they use such features
Re: Perhaps it's time to support multithreading scanning.
I second the call for more multi-threading! DC Pro is one of the best programs out there as far as UI and functionality -- the main area where it falls down is around speed (particularly on the "calculating image metrics". As you only have so much time to invest in improving the software, might I suggest you focus on this area (otherwise your loyal customer base is likely to look elsewhere for better optimised programs),
Re: Perhaps it's time to support multithreading scanning.
For me the reading metadata in the image search also takes an enormous amount of time, like days.