Perhaps it's time to support multithreading scanning.

The best solution for finding and removing duplicate files.
Post Reply
XMJL
Posts: 1
Joined: Thu Nov 23, 2023 10:33 am

Perhaps it's time to support multithreading scanning.

Post by XMJL »

With SSDs being very common now, disk speed is often no longer the bottleneck. Can you please consider incorporating support for multithreading scanning in the upcoming versions?
wwcanoer
Posts: 51
Joined: Wed Aug 19, 2020 5:49 am

Re: Perhaps it's time to support multithreading scanning.

Post by wwcanoer »

Yes please!
Discovering Files: One drive at a time.
Calculating Hashes (MD5 Partial): All drives at once
Calculating Hashes (MD5): All drives at once

Why is the initial directory scan (discovering files) one drive at a time? This appears to be one place to significantly speed up the program.

When I was testing DC on two internal nvme drives, everything went great. Now that I am running it with 25TB on 14 drives re-discovery (discovering Files alone, before matching duplicates starts) takes 1.5 hours. So, once this next scan is done, I need to figure out how to use DC efficiently.
User avatar
DigitalVolcano
Site Admin
Posts: 1731
Joined: Thu Jun 09, 2011 10:04 am

Re: Perhaps it's time to support multithreading scanning.

Post by DigitalVolcano »

Looking at multithreading for the file tree building for version 6. With the other scan modes it'll depend on the scan type as some of the third party libraries may not multithread well.
uwek
Posts: 1
Joined: Sat Dec 16, 2023 4:59 pm

Re: Perhaps it's time to support multithreading scanning.

Post by uwek »

Absolutely yes please!

I used v4 since 2018 and today I bought v5 and yes, I'm somewhat disappointed regarding the basic technical level: For example: No multithreading when calculating "MD5 Partial"s results in only ~4MB/s calc speed with my 8 core CPU and a fast NVMe SSD...
In addition, we are almost in 2024 and it's still a 32bit software - which might not be an actual limitation.

Aside of that, the software itself is great - otherwise I wouldn't have bought it a 2nd time :-)
Last edited by uwek on Sat Dec 16, 2023 5:21 pm, edited 1 time in total.
killermilind
Posts: 3
Joined: Tue Aug 29, 2023 11:28 am

Re: Perhaps it's time to support multithreading scanning.

Post by killermilind »

Agreed, please modify to use multiprocessing (threading/process choice depends on many factors) at file tree building phase.

I am not 100% sure it uses multiprocessing when scanning two sources even at hash time, because I only see one counter increase at any time. The counter that increments switches periodically between the two sources. But the I/O shows parallel reading, so not sure... can someone/DV confirm?
User avatar
therube
Posts: 615
Joined: Tue Jun 28, 2011 4:38 pm

Re: Perhaps it's time to support multithreading scanning.

Post by therube »

we are almost in 2024 and it's still a 32bit software
(AFAIK) it's "32", & it is also "64".
The "Duplicate Cleaner 5.exe" .exe itself shows as x86.
(And I'm not quite sure how it works, but...)
When it runs, depending on OS bit-ness, it will run as either x86 or x64.
(You will also see both x86 & x64 .dll's in the programs instalDir.)


And as far as multithreading & such, you need to be careful that it is done "right".
As there can be differences between theoretical efficiencies & what one actually "gets" (as in may factors can affect outcomes, more then "just" making something "multithreaded").
SiMoZ_287
Posts: 16
Joined: Tue Nov 16, 2021 5:42 pm

Re: Perhaps it's time to support multithreading scanning.

Post by SiMoZ_287 »

I am fairly confident that even if sometimes multithreading can be less good in practice than what someone might think there is still a ton of speed to claim. Some tasks in general are pretty slow, like the initial file discovery, the metadata reading for pictures and the whole picture pipeline is pretty slow. Right now I'm doing duplicate checking for pictures and it does like 1 picture each second with the cpu sitting at 7% and every disk and the network basically idle. With other software it takes a lot less time even to compare videos which should be much heavier but I know multi threading is not something you do in a day and you have to be very careful. Maybe using windows indexing or something like everything search may allow for some improvement for the file discovery... but I don't know, even some open source programs do it faster and I don't think they use such features
canman
Posts: 1
Joined: Fri Jul 07, 2023 9:38 am

Re: Perhaps it's time to support multithreading scanning.

Post by canman »

I second the call for more multi-threading! DC Pro is one of the best programs out there as far as UI and functionality -- the main area where it falls down is around speed (particularly on the "calculating image metrics". As you only have so much time to invest in improving the software, might I suggest you focus on this area (otherwise your loyal customer base is likely to look elsewhere for better optimised programs),
SiMoZ_287
Posts: 16
Joined: Tue Nov 16, 2021 5:42 pm

Re: Perhaps it's time to support multithreading scanning.

Post by SiMoZ_287 »

For me the reading metadata in the image search also takes an enormous amount of time, like days.
Post Reply