Is there a way to speed up searches?

The best solution for finding and removing duplicate files.
Post Reply
drew
Posts: 37
Joined: Tue Mar 13, 2012 5:03 pm

Is there a way to speed up searches?

Post by drew »

Lots of files but on a fairly fast machine built for batch image and video editing
i7-3930 @3.20
ram 32.0 GB
win7 pro 64-bit
Storage (8) Vertex SSDs on a HP 3230 RAID card

using DC Pro 3.7

Currently searching 2 folders (492243 files) for dups
I am only 43% through and it's been running since yesterday (22hrs)

Uses only a small fraction of cpu (8%) and ram (1.6G) capabilities. Pef Mon images on request

Can this be changed?

Suggestions?
User avatar
therube
Posts: 614
Joined: Tue Jun 28, 2011 4:38 pm

Re: Is there a way to speed up searches?

Post by therube »

What are your search criteria?

If you expect to find (at least some) "exact" duplicates, you will always be faster using a 'Regular Mode' 'Same Content' scan, first, as that will be faster then Image or Audio.

Likewise if you're expecting exact dups, and some other criteria fit too, like Name & Size, finding where those same name/size files exists might allow you to more directly explore only those particular directories, where you can then add Same Content to the mix & know for sure that you're picking up dups, & cut them out.

Any filtering you can do?

Specific directory trees rather then an entire drive?
Specific file types?
Set of directories that you can find that will compare; exactly or mostly?

So basically, start with quicker more generalized search & use that as a starting point to a more refined, specific search, culling those entries. Then as that type of works proves less successful, expand your criteria to pick up more duplicates outside of your more constrained searches.
drew
Posts: 37
Joined: Tue Mar 13, 2012 5:03 pm

Re: Is there a way to speed up searches?

Post by drew »

Have been using image mode since all files are .tif, .png or .jpg

@99% since in many cases the only difference between files is a color or a contrast correction

Do not search an entire drive as that would be impossibly slow. Also this is a data drive. System and program files are on another drive

Usually the search is within one folder, but sometimes 2, or 3 folders

I will try the exact mode on the next search (current search is still running)

Can't argue that starting with quicker more generalized search & use that as a starting point to a more refined, specific search would make each search quicker, but wouldn't the total time be comparable.

Is there a way to bump up the cpu usage? / speed up the process? Don't know if DC is throttled to keep from damaging systems. This system is water cooled for sustained load and would handle higher usage rates.
drew
Posts: 37
Joined: Tue Mar 13, 2012 5:03 pm

Re: Is there a way to speed up searches?

Post by drew »

...or the other hand DC does not tax the computer to a point where a slow down is noticeable. So if I plan ahead it can run DC in the back ground for a few days. That is assuming adding to or editing images in those folders does not create a problem.


Still would like to know if DC can be tuned to use a larger % of computer resources.


Add Edit:

Reading through the forum I found this "You could try factory resetting the database file. Close the program first, then delete this file:
C:\Users\[YOUR USER]\AppData\Roaming\DigitalVolcano\DuplicateCleaner\DuplicateCleanerPro.data"

If I moved that file from C:\ (single SSD) to E:\ (SSD array) would that help speed things up?
drew
Posts: 37
Joined: Tue Mar 13, 2012 5:03 pm

Re: Is there a way to speed up searches?

Post by drew »

Could I get a response on the question of devoting a larger % of computer resources to running Duplicate Cleaner.
User avatar
DigitalVolcano
Site Admin
Posts: 1717
Joined: Thu Jun 09, 2011 10:04 am

Re: Is there a way to speed up searches?

Post by DigitalVolcano »

Currently no way to utilize more resources with the current version. The main bottleneck in image mode is the hard drive - it has to read all the images to create the metrics for comparison. Once the images are cached it is a lot faster. Is it slow for you on the metrics or the comparison stage?

Version 4 (Upcoming) will hugely speed up the checking in standard mode - still doing some work on Image mode performance. Delete/file removal speed will be increased too.
drew
Posts: 37
Joined: Tue Mar 13, 2012 5:03 pm

Re: Is there a way to speed up searches?

Post by drew »

Not sure which would be considered slower so I'll post some numbers. All files are images Files checked 493K Duplicates found 91K. Many of those are not really duplicates - for example where we have added a watermark for a customer.

Comparison stage 12 - 20 hrs . I went home and left the computer on and it was done the next day

2+ Days - After a day, or some other period the clock resets, the time shown is 8 hr but that is not correct.

I'm still sorting through the list and most likely will be for the next few weeks, but after I finish I'll redo the search and set a timer if needed.

Deleting speed does not seem to be an issue.


The main bottleneck in image mode is the hard drive
Does DC take advantage of SSD/RAID capabilities?
isaac124
Posts: 1
Joined: Wed Aug 03, 2022 2:53 pm

Re: Is there a way to speed up searches?

Post by isaac124 »

I hate to dig up an old post.. but google pulled this post up as the most relevent to my question.
I would like to say that the theoretical bottleneck of hdd speed seems to not be the case..
Using Windows 10.
i'm doing an image mode scan and seeing hash calculations of 50 MB/s. i check Task Manager and see no more than 2% utilization of my M.2 NVME SSD. 31% utilization of memory, and 34% utilization of CPU.

The SSD can easily transfer 500MB/S.. and it's only reading 20-50 MB/s according to the Resource Monitor.

So i feel i must ask the question again. What is the actual bottleneck in the software? Perhaps it has changed.. I guess it could be the issue that the number of cores in the CPU is the bottleneck because i guess in the way it hashes images, it might need to pull one image out at a time, pass it through a pipeline involving a CPU core, then put it back where it found it before moving on to the next image. so there's a lot of waiting around for all the different parts of the machine involved?

Isaac
Post Reply