Zip files

The best solution for finding and removing duplicate files.
Post Reply
arnoudj
Posts: 1
Joined: Tue May 19, 2020 11:27 am

Zip files

Post by arnoudj »

I had a enormous problem in removing files. After about 10 removals the proces slowed down to 1 file a minute or more. And for days I did not know why. Until some minutes ago: Zip files were treated as maps. And that gives enormous delays. Because a zip file wil be unzipped, then a file will be deleted and the rest will be zipped again. And that is a time consuming proces. So, unmark the option for zip files and everything will be normal.
TheBabyGraz
Posts: 1
Joined: Sat May 30, 2020 3:26 am

Re: Zip files

Post by TheBabyGraz »

Huh. I'd avoided deleting zipped files because of the slowdown but due to a backup/restore after the new v2004 Windows 10 May update, I had to restore from zips and just trying to cross reference and delete redundant files from the zipped folders (25K at 100GB / 112GB uncompressed, within spread across twenty 10GB zip files). I anticipated the slowdown but was absolutely baffled when I saw that since stepping away to let it run overnight on a 3950x with a pcie 4.0 nvme and a 32gb cached ram buffer (it's not slow is what I'm saying), it had only deleted a total of ~3000 files after eight hours, but incurred a whopping EIGHTY SIX TERABYTES of disk writing in the process. I've been trying to reckon the insane slowness and massive TBW disk wear and I think you've got the explanation there. Manually mapping, unzipping, deleting and rezipping each 10GB zip folder for each marked file individually rather than deleting all marked files in the zip folder once it's open.
So yeah, definitely +1'ing the recommendation to just not even touch the "Scan in Zip Files" option for jobs with more 500 files or 4GB, let alone 25K / 100GB.

That said, if there's an easy fix I'm happy to hear it. Running under windows shell halts the massive disk writing (as far as I can tell) but seems to slow down the process even further and Increasing the zip files temporary space thresholds to 4, 10 and 20GB in the advanced options still seemed to have had little to no effect. Genuine question, could increasing the threshold past 112GB+ to just let it unzip all marked folders for the duration have the desidered effect or would that kill my PC?
User avatar
DigitalVolcano
Site Admin
Posts: 1717
Joined: Thu Jun 09, 2011 10:04 am

Re: Zip files

Post by DigitalVolcano »

Thanks for the feedback - you are correct.
Currently re-writing this for version 5. The problem is (as you've deduced) that it removes the zip entry for the deleted file and saves the zip. This causes the entire file to be rewritten. Things go South where there are entire duplicate zip files - the delete code isn't currently smart enough to determine that it should just remove the entire .zip file in one go (or to remove multiple entries at once while the zip is open). For a large zip file of a thousand files this would cause a the file to be re-written a thousand times. Not ideal.

Running via the shell passes delete/move operations to the Windows Explorer API, which may have a better write efficiency but can be slow with zips.
abobymous
Posts: 19
Joined: Thu Sep 20, 2018 7:30 am

Re: Zip files

Post by abobymous »

+1

I would like to upvote this as a feature Request. Rather 2 specific features:
a. Multiple Deletes and only 1 rewrite of the archive (ZIP, etc.)
b. Remove the ZIP file if all files are deleted. Currently, we have to find all the *.zip files with a size of 22 bytes and delete them after running DC Pro.

Thank you for the continued development of this very useful tool!
User avatar
DigitalVolcano
Site Admin
Posts: 1717
Joined: Thu Jun 09, 2011 10:04 am

Re: Zip files

Post by DigitalVolcano »

Try unchecking 'via windows shell' on the current version. DC should handle zip deletes in one pass (otherwise it's left to the OS to do it and it'll be slower usually).
abobymous
Posts: 19
Joined: Thu Sep 20, 2018 7:30 am

Re: Zip files

Post by abobymous »

Will do
pnachtwey
Posts: 15
Joined: Thu Apr 15, 2021 9:44 pm

Re: Zip files

Post by pnachtwey »

I have had the same problem. I do not use the treat .zip files like folders because Duplicate Cleaner DOES NOT treat .zip files like folders.
A folder can be protected. A .zip file cannot. What pisses me off is losing a file from a .zip file. A .zip file is often a snap shot of files that make a project a losing a file or two means that .zip is now not a whole project and I must hunt for files to make it whole.

I have suggested before that .zip should be protected by default. Also, to make searching faster, DC5 should unzip the files into a protected folder so DC5 doesn't need to search through the .zip file which is extremely slow. When done the extracted directory should be deleted. Another option is to search through the .zip files ONCE and make a database of what is in the .zip file so there is no need to search through the .zip file. This will work well IF the option to delete files in .zip files is not set.
Post Reply