Duplicate Cleaner v1.3 features / wishlist
Duplicate Cleaner v1.3 features / wishlist
Implemented so far:
-XP control Styles.
-Updated folder tree type interface
-Display/Ignore hardlinked files
-Un-hardlink files
-Export all files to CSV
-Rename files
-Invert Selection
-Date sort bug fixed
To Do / Requests:
-Localization (in progress) eg Spanish.
(Please let me know if you are interested in translations!)
-Sel Assistant option 'select all but one in each group by path'
-It would be even better if I could select a "master folder tree" and possibly a master folder from which nothing could be deleted and any duplicates in the other folders could be automatically selected.
-Fix bug where protected folder name check is case sensitive.
-Remove 'orphan' files from list when rest of duplicate group has been dealt with
- Add a quick right click option to files in the duplicate list to "select all duplicate files in the same folder". Once you realize that a particular folder has allot of duplicate files (such as photos) in it, it would be nice to be able to just right click on one file and have it select all duplicates in the same place.
-After you have run a scan and you have a list of duplicate groups on the screen, it would be much easy and less confusion if files that have been deleted disappear from the screen, alone with the ones that were not deleted. Or at least make them grey out, so that it is easier to see what you have left to go through. I like to incrementally go though the list, delete some, and then continue, but once files have been moved to the recycle bin, they aren't duplicates any more and should disappear.
- It would be a nice feature if there were a "just to be sure" option before you emptied the recycle bin that would scan each file recently added to it to make SURE that there is another copy of it elsewhere on the computer. Right now, I am scared to empty the recycle bin, so a quick scan to make sure I don't delete the last copy of a file by accident would be a nice feature.
-XP control Styles.
-Updated folder tree type interface
-Display/Ignore hardlinked files
-Un-hardlink files
-Export all files to CSV
-Rename files
-Invert Selection
-Date sort bug fixed
To Do / Requests:
-Localization (in progress) eg Spanish.
(Please let me know if you are interested in translations!)
-Sel Assistant option 'select all but one in each group by path'
-It would be even better if I could select a "master folder tree" and possibly a master folder from which nothing could be deleted and any duplicates in the other folders could be automatically selected.
-Fix bug where protected folder name check is case sensitive.
-Remove 'orphan' files from list when rest of duplicate group has been dealt with
- Add a quick right click option to files in the duplicate list to "select all duplicate files in the same folder". Once you realize that a particular folder has allot of duplicate files (such as photos) in it, it would be nice to be able to just right click on one file and have it select all duplicates in the same place.
-After you have run a scan and you have a list of duplicate groups on the screen, it would be much easy and less confusion if files that have been deleted disappear from the screen, alone with the ones that were not deleted. Or at least make them grey out, so that it is easier to see what you have left to go through. I like to incrementally go though the list, delete some, and then continue, but once files have been moved to the recycle bin, they aren't duplicates any more and should disappear.
- It would be a nice feature if there were a "just to be sure" option before you emptied the recycle bin that would scan each file recently added to it to make SURE that there is another copy of it elsewhere on the computer. Right now, I am scared to empty the recycle bin, so a quick scan to make sure I don't delete the last copy of a file by accident would be a nice feature.
Apart from the change colours in the group column when changing the sort from name to size, and saving the column widths between searches that I posted about before, showing the number or times each file is already hardlinked would be good.
BTW, I just did a test and there is a situation that should be tested for, or at least a warning put in the doco.
\temp\test\a\test.txt, exists. An ntfs junction (essentially a directory hardlink, see http://www.microsoft.com/technet/sysint ... ction.mspx) of \temp\test\a is created at \temp\test\d and a duplicate file scan is done, and you delete 1 file, and BOTH files are deleted.
DFC either shouldn't scan beyond junction points, or throw up a warning that you could delete all copies of a file if you delete or try and hardlink, where a junction exists.
Thanks again.
Mark
BTW, I just did a test and there is a situation that should be tested for, or at least a warning put in the doco.
\temp\test\a\test.txt, exists. An ntfs junction (essentially a directory hardlink, see http://www.microsoft.com/technet/sysint ... ction.mspx) of \temp\test\a is created at \temp\test\d and a duplicate file scan is done, and you delete 1 file, and BOTH files are deleted.
DFC either shouldn't scan beyond junction points, or throw up a warning that you could delete all copies of a file if you delete or try and hardlink, where a junction exists.
Thanks again.
Mark
?? A 'bit' more reliable? (sorry, Math geek mode ON)
CRC's are 32 bit, If you generate random files all different, you would need 77162 files, before you had a 50% chance of 2 files having different contents and a matching CRC, (not a 50% chance of a file matching the last, but of some 2 files in the group of 77162)
Md5sums are 128 bit (3.4x10^38) if you compared the md5sums of random, different files, you would need ~2x10^19 different files, before some 2 of them had a 50% chance of matching hashes, 20 million million million. I doubt that many different files will ever be created by mankind. That's 3 billion files for every person on earth.
Files needed for a x% match of 32 bit CRC's
30084 10%
43781 20%
55352 30%
66241 40%
88718 60%
101695 70%
117579 80%
140636 90%
160414 95%
198890 99%
2^32+1 100%
ain't Excel and a bit of maths wonderful.
CRC's are 32 bit, If you generate random files all different, you would need 77162 files, before you had a 50% chance of 2 files having different contents and a matching CRC, (not a 50% chance of a file matching the last, but of some 2 files in the group of 77162)
Md5sums are 128 bit (3.4x10^38) if you compared the md5sums of random, different files, you would need ~2x10^19 different files, before some 2 of them had a 50% chance of matching hashes, 20 million million million. I doubt that many different files will ever be created by mankind. That's 3 billion files for every person on earth.
Files needed for a x% match of 32 bit CRC's
30084 10%
43781 20%
55352 30%
66241 40%
88718 60%
101695 70%
117579 80%
140636 90%
160414 95%
198890 99%
2^32+1 100%
ain't Excel and a bit of maths wonderful.
Actually, that sounds like a great idea.
And not bothering to do even the CRC check if the files are already hardlinked together might be a worthwhile speedup too. I.E. only do the CRC if the files match in size and aren't hardlinked. Mind you, since computing the CRC and MD5sum both require you to read the files, you'ld want to hope they stayed in the cache.
And I shouldn't have put this discussion in the features request thread, sorry.
And not bothering to do even the CRC check if the files are already hardlinked together might be a worthwhile speedup too. I.E. only do the CRC if the files match in size and aren't hardlinked. Mind you, since computing the CRC and MD5sum both require you to read the files, you'ld want to hope they stayed in the cache.
And I shouldn't have put this discussion in the features request thread, sorry.