Finding duplicates in zipped or other compressed formats is EXTREMELY SLOW!

The best solution for finding and removing duplicate files.
Post Reply
pnachtwey
Posts: 15
Joined: Thu Apr 15, 2021 9:44 pm

Finding duplicates in zipped or other compressed formats is EXTREMELY SLOW!

Post by pnachtwey »

I have no idea what makes duplicate cleaner pro so slow. It is so slow that I don't use the treat compressed files as directories anymore. I expand any compressed file I am interested in into a temporary folder and then look for duplicates.

It appears that duplicate cleaner pro scan through the compressed file each time it thinks is necessary. Currently duplicated cleaner pro 5 takes hours to do a scan if one tries to scan through compressed files.
What duplicate cleaner pro should do is decompress the compressed file into a temporary file/directory on the disk so duplicate cleaner pro can compare files more rapidly. When done the temporary directory/file is used to make the new compressed file and the temporary file/directory is deleted. The only problem see with my approach is that if the drive is nearly full, expanding all the zip files may overflow the drive.

I like that version 5 supports many different compressed formats in addition to just .zip but I feel the implementation is poor.

Outside of that, I feel that duplicate cleaner pro is a must for IT people or anyone with a NAS or big USB drives used for storage.

My solution. Remove the option to search through compressed files on the bottom of the screen. Instead allow one to select the compressed files one is interest in as if it was a real directory. This way only the compressed files one is interested are searched and again, I would decompress these files into temporary folders to make the comparing easier. I would hide this part for the user.
pnachtwey
Posts: 15
Joined: Thu Apr 15, 2021 9:44 pm

Re: Finding duplicates in zipped or other compressed formats is EXTREMELY SLOW!

Post by pnachtwey »

I am doing a test.
I have two compressed files alone in a tmp directory.
One is a 7z file and the other a .zip file but both have similar files in them. The .zip file is an older version of the .7z file.
The .zip file is 20 MB and the 7z file is 15 MB.
I am finding the duplicates in both files and it is taking forever. Actually, looking at the screen it is estimating it will take about 58-60 min to be done.
I should have started an independent timer because the estimated time keeps changing.
It would be faster to extract each compressed file into its own temporary directory and then look for duplicate files in the temporary directories. Then re-compress the files deleting the old original compress files and then deleting the temporary directories. The user doesn't need to see the gory details.

The is a good example of no good deed goes unpunished but really, this wasn't thought out well.

I am kind of disappointed because I recently upgraded a computer from a free version to a paid version to get the compressed file capability just to find it is almost useless.
pnachtwey
Posts: 15
Joined: Thu Apr 15, 2021 9:44 pm

Re: Finding duplicates in zipped or other compressed formats is EXTREMELY SLOW!

Post by pnachtwey »

It is now many hours later. It took 1H 10 min+ to find the duplicates and it taken over 2 hours to remove the duplicates.
pnachtwey
Posts: 15
Joined: Thu Apr 15, 2021 9:44 pm

Re: Finding duplicates in zipped or other compressed formats is EXTREMELY SLOW!

Post by pnachtwey »

After 6+ hours I gave up . I manually did what I suggested above. It took about 15 minutes to extract the .zip and .7z files into two separate temporary folders. The Duplicate Cleaner Pro then found the duplicates in the two folders quickly. It then took me some time decide which duplicates to delete and to merge what was left into one directory which I then 7Zipped into a new .7z file to replace the original. The whole process took me about 45 minutes. Most of the time was spent unzipping, manually manipulating and then zipping back up. The Duplicate Cleaner Pro took at most 5 minutes of that time.

I don't expect looking for duplicates in compressed files to be fast but it should happen in my life time or before the power goes out.
Currently we can't select which compressed files to look at and the compressed files select should be extracted to a temporary folder.

As it is the compressed file feature of Duplicate Cleaner Pro is unusable.
pnachtwey
Posts: 15
Joined: Thu Apr 15, 2021 9:44 pm

Re: Finding duplicates in zipped or other compressed formats is EXTREMELY SLOW!

Post by pnachtwey »

Months later and nothing has been done.
I do not use the treat .zip files like folders because Duplicate Cleaner DOES NOT treat .zip files like folders.
A folder can be protected. A .zip file cannot. What pisses me off is losing a file from a .zip file. A .zip file is often a snap shot of files that make a project a losing a file or two means that .zip is now not a whole project and I must hunt for files to make it whole.

I have suggested before that .zip should be protected by default. Also, to make searching faster, DC5 should unzip the files into a protected folder so DC5 doesn't need to search through the .zip file which is extremely slow. When done the extracted directory should be deleted. Another option is to search through the .zip files ONCE and make a database of what is in the .zip file so there is no need to search through the .zip file. This will work well IF the option to delete files in .zip files is not set.
pnachtwey
Posts: 15
Joined: Thu Apr 15, 2021 9:44 pm

Re: Finding duplicates in zipped or other compressed formats is EXTREMELY SLOW!

Post by pnachtwey »

Months later and nothing has been done.
I do not use the treat .zip files like folders because Duplicate Cleaner DOES NOT treat .zip files like folders.
A folder can be protected. A .zip file cannot. What pisses me off is losing a file from a .zip file. A .zip file is often a snap shot of files that make a project a losing a file or two means that .zip is now not a whole project and I must hunt for files to make it whole.

I have suggested before that .zip should be protected by default. Also, to make searching faster, DC5 should unzip the files into a protected folder so DC5 doesn't need to search through the .zip file which is extremely slow. When done the extracted directory should be deleted. Another option is to search through the .zip files ONCE and make a database of what is in the .zip file so there is no need to search through the .zip file. This will work well IF the option to delete files in .zip files is not set.
killermilind
Posts: 3
Joined: Tue Aug 29, 2023 11:28 am

Re: Finding duplicates in zipped or other compressed formats is EXTREMELY SLOW!

Post by killermilind »

I bought this software recently, for the sole reason that it can spot duplicate folders, but searching through compressed archives is also a plus point that went into buying this.

Please fix this! Just unzip into a temp folder or maybe even in RAM, and add it suitably to the path.
Post Reply