Duplicate Cleaner FAILS!

The best solution for finding and removing duplicate files.
Post Reply
User avatar
aku

Duplicate Cleaner FAILS!

Post by aku »

Tried your Duplicate Cleaner on my computer but it fails with the following error:

Error during duplicate file scan: Index was outside the bounds of the array. Report this error to DigitalVolcano and we'll try and fix it!

The scan reports the following:

Checked 100% - 1426101/1427001 files (463 GB
64331 Duplicate sets found
User avatar
aku

Post by aku »

Another problem is that there are over 5 million files on my disk and Duplicate Cleaner only reports seeing about 1.4 million of them. How come?

The files and directories are not protected and are available for the program to scan.

As such your program looks most promising from the ones I have tried if it worked. Also all other duplicate file programs I have tried either crash or can't handle the number of files I have.

I guess I have to do my own program! Pretty easy to do if you don't need a fancy GUI.
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

463 GB?

Start with smaller subsets to reduce your totals bit by bit.
User avatar
James

Post by James »

When I scan (periodically), I scan more than 2 TB (~1.5M files) without error. It takes a hours, but then I would expect it to. I'm actually very pleased that it can handle that much data without choking or driving memory usage through the roof... Mega-kudos for that!

So I don't think your issue is the size of your data... Try a systematic approach... select half the folders you are now, do you get the error? If not, select the other half. Repeat until you zoom in on the folder that's breaking you.

If you don't get errors in either half, add folders a few at a time until you break. When you find the folder that breaks you, exclude it and add the other folders one by one... If more than one particular folder breaks you, it's probably a size/file count issue. If it's one particular folder, there's something in there that's tripping you up

aku> Yeah, good luck with that. I've never seen an arrogant programmer who's also a good programmer. Are you scanning your C:\ drive? If so, your discrepancy is probably because DC ignores junctions and symbolic links, which Windows, esp. Vista and 7, have a lot of. Would your program be smart enough to handle file system objects that look and act just like directories but aren't? Or would you permanently delete a whole bunch of really important files because there's really only 1 copy of each?

Also, make sure you're not filtering out 0-length files, you're querying all file types, etc. DC is pretty thorough, and pretty spot on with its findings. If it says 1.4 million files, I trust it.
User avatar
aku

Post by aku »

Fool4UAnyway, 463 GB is nothing with current multiterabyte disks, I have a RAID 5 of 2 TB. Any tool not capable of handling 463 GB is a toy.

Well, I don't have time to start playing with DC. Wrote my own duplicate checker in C# in about 2 hours and it's doing the job beautifully and much much faster too than DC. No fancy GUI though.

Sorry James, there are over 5 million files on my disk, Windows Explorer and my own duplicate checker says that so that's what I trust. And I actually know it's about correct, I have 2 file sets, each about 2.5 million files and partly duplicates of each other. That's why I needed a DC.
Post Reply