Target normal files to existing hardlinks (Ticket #736601)

CP44 · Post by **CP44** » Sat Jan 30, 2016 4:42 am

Duplicate Cleaner Pro 3.2.7
Windows 7 x64 SP1
Target drive: NTFS v3.1 (from fsutil fsinfo ntfsinfo)
Target normal files to existing hardlinks

Is there a way to use Duplicate Cleaner (Pro, as that is where Hardlinks are available) to check an area for duplicates, and then replace found, normal files with hardlinks to existing hardlinked files/nodes?

The presentation to users is:
• Hardlinks, mountpoints, and junctions can be excluded
• Hardlinks can be counted

Therefore, you can find out which of the files are already hardlinks, but given a large number of files and varieties, this scenario of duplicates becomes tedious to resolve:
File A = standard
File B = hardlink 1
File C = hardlink 1
File D = standard
File E = hardlink 1

In my case, where the above occurs in situations where I am reasonably sure that all 5 files will not be targeted for changes, I would like to have all 5 files share the same hardlink data to for minimum space usage on the file system. Ideally, I would also like them to have the earliest file creation/modification time attributed to any file in the duplicate group as it would indicate the earliest point in history that data came into existence. (Some users may prefer the latest time.)

"Adding" duplicates to existing hardlink collections does not appear obvious as hardlinks are either entirely excluded (preventing them from being displayed to the user) or shown without any controls to select/filter based on hard links (ex. selection assistant: select files without hardlinks). If a group such as the above example is targeted for hard link creation (ex. "All but one file in each group"), dctmp__* hardlink files are created alongside existing hardlinks, the existing hardlinks are deleted, and the new hardlink files are renamed to take the place of the original hardlinks (as confirmed by ProcMon), resulting in seemingly unneeded operations.

The behavior above does handle cases where content is stored in multiple locations as separate groups of hardlinked files (separate nodes), but it seems to be rough handling overall. The workaround I see at the moment (to avoid changes to existing hardlinks) is to have Hardlinks included in the Duplicate Files view, sort by Hardlink count, select all lines where hardlink count = 0, "Selection Assistant: Work only on currently selected rows", select "All but one file in each group", resolve any "Groups with all files marked" issues.