More memory efficient?

The best solution for finding and removing duplicate files.
User avatar
Eugene

More memory efficient?

Post by Eugene »

First, thanks for the program. I've tried more than a dozen of similar programs (i think I quite exhaustively researched everything available) and ended up with this one.

My problem is the following. I have a big backup storage (1TB, 1.3 million files) which contains a lot of identical files.

Duplicate cleaner seems suitable for such a large task (the scan takes around 16 hours) but it seems to run out of memory on populating the list on my 2GB machine (memory allocation of Duplicate Cleaner at the moment of crashing is 1.9GB). It stops with the message "Error in GO! process... This item's control has been deleted". The window that is displayed at this moment is "Populating List" with progress bar at roughly 15% and the text "217609 duplicates found". The version is 1.4.3.

I wonder if it is possible to make Duplicate Cleaner more memory efficient? One idea is to make it automatically save the duplicate list before populating, which can be later loaded or processed manually. Or in my case instead of populating a list to save a batch file that would replace all duplicate with hardlinks.

Again thanks for Duplicate Cleaner.

PS: I'll be happy to report on success if any improvement has been made.
User avatar
dv2

Post by dv2 »

Good points - I've considered dumping the 'All files' list as this must waste a lot of memory. (Does anyone use it?). I keep meaning to run some tests on the lists to see at what point they break. Backing up the list to a text file first is a good idea - might look into it.

All this issues will be cleaned up when I port DC over to .NET for version 2.0 (It currently uses the creaking VB6 framework).
User avatar
Mull

Post by Mull »

I would like to also report that I have received this same error as well with more than 200,000 duplicated found.

Duplicate Cleaner 1.4.3 was running on 4 GB (3.25 available on Windows XP) and I did check the amount of RAM used when the error came up.

If I remember correctly, it was around 2,030,000 KB mark, which is as Eugene quoted to be around 1.9 GB.

I don't use the "All files" list, as the function is useless to me, and does not tie in with the program's other functions.

I am only interested in duplicates found. There are many other dedicated programs that can solely list files along with their file data at a much faster rate.
User avatar
DV

Post by DV »

Thanks.
I've tested, and there is no fixed limit to the number of items in the lists. Looks like the barrier is total memory used (probably 2gb). Switching off the All Files list should help - will look at that for the next release.
User avatar
Patrick

Post by Patrick »

The workaround for this is to stop the scan before it reaches the error point. Stopping the scan still allows you to work with the duplicate sets that have been found so far. I made an AutoIt script to automate this, as well close an "Internet Connection Error" message box that stops everything until it is closed.

Even with these limitations, Duplicate Cleaner is by far the best out there. The AutoIt script is included below. The script stops the scan when WHEN_TO_STOP duplicate groups have been found, so that is the only part you need to modify.

<code>
#Include <WinAPI.au3>

const $ADDR=0x004C2038 ;num dupe groups found
;const $ADDR=0x004C2180 ;num files scanned?
const $WHEN_TO_STOP=50000
$pid=WinGetProcess("[CLASS:ThunderRT6FormDC]")
$dostop=False
if $WHEN_TO_STOP then
   $dostop=True
   $s_whentostop=DllStructCreate("DWORD")
   $hProcess=_WinAPI_OpenProcess(0x10,0,$pid,true)
   if @error then exit @error
endif
do
   Sleep(1000)
   if $dostop Then
      $iRead=0
      if not _WinAPI_ReadProcessMemory($hProcess,$ADDR,DllStructGetPtr($s_whentostop),4,$iRead) or $iRead <> 4 then exit 1
      if DllStructGetData($s_whentostop,1) >= $WHEN_TO_STOP then
         _WinAPI_CloseHandle($hProcess)
         CancelScan()
         $dostop=False
      EndIf
   endif
   if winexists("[TITLE: Internet Connection Error]") then winclose("[TITLE: Internet Connection Error]")
until not ProcessExists($pid)
exit 0

func CancelScan()
   WinActivate("[CLASS:ThunderRT6FormDC]")
   Sleep(2000)
   WinWaitActive("[CLASS:ThunderRT6FormDC]")
   controlclick("[CLASS:ThunderRT6FormDC]","","[CLASS:ThunderRT6CommandButton; INSTANCE:8]")
   Sleep(2000)
   WinWaitActive("[TITLE:Duplicate Cleaner]")
   controlclick("[TITLE:Duplicate Cleaner]", "", "[CLASS:Button; INSTANCE:1]")
EndFunc
</code>
User avatar
axis

Post by axis »

Interesting script you have there Patrick, though it's hard to estimate how many duplicates you will come up with before the error point is reached.

I got a crash message at 55.5% of 2.02 million files, with 142941 duplicates at that point.

RAM used for the program was 1,922,356 KB at crash point.

I guess it was a good 20-hour stress test.
User avatar
Vaiko

Post by Vaiko »

I've the same crash with only 40k duplicates but DC always terminates with 1065725 scanned files. I think the memory limitation is not only on duplicates but also on the number of scanned files. Not sure how switching off All File list helps with this (didn't found that option in the UI btw).
User avatar
Vaiko

Post by Vaiko »

Sorry, can't edit my post. Not the exact same error as the original post because my DC terminates during the scan and not when populating the list but I think it may also be an out of memory crash.
User avatar
DV

Post by DV »

Ditching the All Files list will save the memory used to store the list control. Hopefully this won't be an issue once DC is ported over to .NET
User avatar
Marshall Shield

Post by Marshall Shield »

Worked on my main Dell Precision T3400 OK
Started on my external hard-drive, found 1,006,525 files
90,745 duplicates
Three times now, it finishes the first level, then starts the 2nd, but fails and asked to be closed
This Hard-drive is close to being full
There is enough room on the main hard-drive
Post Reply