Hard links, sorting and include or exclude?

The best solution for finding and removing duplicate files.
User avatar
Fool4UAnyway

Hard links, sorting and include or exclude?

Post by Fool4UAnyway »

Well, this is one great utility! Thanks, guys! I've already been using Text Crawler for some time and I just started using Duplicate Cleaner to shrink the number of bytes equal (most "empty") map tiles allocate on my hard drive(s). It's nice that I can hard link those equal files to a "named" tile in a parent folder.

First of all, how does the utility sort files when you click the Links header in the Duplicate Files list?

I can't really figure it out, there are groups with the numbers of hard links being 5, 46, 37 103 in that order, or the other way around. How is this achieved?

Second, when I find duplicate tiles, I look at the picture, copy it to the parent folder of all searched folders and give it a logical name. Now, when I have created yet another folder, I would like to be able to find all duplicate tiles in the new folder(s), compared to the parent folder and all other sibling folders.

I can, of course, compare the new folder(s) to the parent folder only first, to find all tiles already known to be duplicates. But I would have to compare the new folder(s) to the other sibling folders in a second pass to find if there are any _new_ duplicate matches between them.

Is there a way I could do this in a single pass, without getting a huge number of already hard linked tiles?
Is there any way of specifying the "master" tile that duplicates are linked to? Of course, they are in the parent folder.
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

To be correct: the numbers above should be 5, 46, 37(,) and 103. I do not have a clue why 37 would be listed _below_ 46 etc.

What I talk about in with the second part of my message is that I would like to show all already hard linked files as a single entry, in my case the "master" tile in the parent folder. So I would like to exclude all _other_ hard links, but keeping one entry I can compare against tiles in the new folder(s). This would leave me with either duplicates in the new folders compared to the master tiles, and/or duplicates in the new folders compared to the other sibling folders now containing duplicates that weren't matched before.

P.S.

I am not quite sure anymore about where I should have posted. I didn't look at the "Duplicate Cleaner Labs" after seeing the title and thinking it would contain practices or instructions about how to do things. Later I saw the description mentioning "suggestions". My question may be not so general as the description for the "Duplicate Cleaner" title.
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

Is there a maximum (of 1024) to the number of files that hard link to the master file?

One master file already had 308 links. I wanted to link another 1358 files to the master file. I got an error, no files were linked.

I unlinked the master file. Then I linked again 1358 files. This was successful for only 1023 files...

Is there a better way to link a huge number of files to the same master file, instead of having to create (keep) copies of the master file, to be able to link more than 1024 files to it?
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

It seems the entries in the Link columns are sorted as _text_ rather than as numbers... Please correct this!
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

You can read more about Hard Links (Junctions, Cloning etc.) here:

"Link Shell Extension"
http://schinagl.priv.at/nt/hardlinkshel ... llext.html

This documentation states the limited number of Hard Links to a single file of 1023.
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

The search for duplicate files in all _small_ level 1 (maximum zoom level) and the "general" map tiles, 137195 files in total, found 46 groups of doubles.

34451 files have doubles.

Most groups only have a small number of duplicates, and already a "general" file duplicates in other levels have been linked to.

However, there are at least 2 groups containing literally thousands of doubles. I want to link them, but, of course, I can't do this in a single pass: there is a limit of 1023 hard links to a single file.

So, I would like to be able to:
- specify the master file(s) to link duplicates to
- if not already present, be able to copy one of the duplicate files to the "general" folder, (re)name it and make that the master file, to link all the duplicates to
- have Duplicate Cleaner create all the master files (with number suffix for the name I enter), if necessary, and link all the duplicates to those master files, with the maximum number of 1023 links to each numbered master file

So what will be left is a general (parent) folder containing master files of the duplicates in all of the subfolders and subfolders containing hard links to the master files and files unique to all subfolders.

Could you help me with that?
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

> Could you help me with that?
User avatar
DV

Post by DV »

Hi -sorry for the delay in replying. I'll try and address some points. Glad you like the program, by the way!

-The 'Hard Links' column is being sorted as a text field, as you realized. This bug is addressed in DC release 1.4.4 which should be out in the next few days.

-The maximum no of hard links to a file is 1023 (a 10-bit field used on NTFS according to Wikipedia)

-Your idea is interesting - this is the first report I have had of someone running up against the hard link limit. The suggestion you mention could be implemented in Duplicate Cleaner 2.0, which is under development now, but it wouldn't be a high priority (still lots to do). Hopefully fixing the sorting bug will help you a little.

Thanks
User avatar
Fool4UAnyway

Post by Fool4UAnyway »

_Any_ answer is better than none at all. Thanks for replying. Thanks for fixing the sorting bug.

I understand your activities for the new version. Better take it easy and do a good job than rush into disaster!
Post Reply