Delete all but one dupe within each file path location

The best solution for finding and removing duplicate files.
Post Reply
jackThom
Posts: 16
Joined: Tue May 27, 2014 7:21 am

Delete all but one dupe within each file path location

Post by jackThom »

I searched throughout the application, as well as these forums trying to determine a way to accomplish this, but I don't see one.

I see the option to "select all but one file in each group".
But what I'd like to do is select all but one file in each path. (Ideally, the newest duplicate.)

Basically I have a lot of main folders with subfolders that may contain some duplicate files not only within the subfolders themselves, but across the subfolders. I want to start my deduplication by eliminating exact duplicates residing in the exact same location (as of course there is virtually no scenario where duplicates that meet that criteria are necessary. Not even hardlinks would be of interest, as it's the same location.)

In other words:

Main Folder
-Subfolder A
--File Folder 1
---file1.jpg
---file1-duplicate.jpg
---file1-duplicate.jpg
---[file2.jpg, file3.jpg...]

--File Folder 2
---file1-duplicate.jpg
---file1-duplicate.jpg
---file1-duplicate.jpg
---[file4.jpg, file5.jpg...]

...and I have a Subfolder B, C, D [...] that contain the same type of duplication within and across their file folders. What I want to do is just delete the newest duplicates within each folder. It is fine if there is a single copy in different folders.

Is there any plan to add a "select all but one file in each path" option?
And more importantly, is there a process that would accomplish this currently?

I like the options in the selection wizard currently. They are thoughtful and largely cover what I need. I think it would be greatly improved if the same options are included but for "in each path", in addition to the "in each group" options. This would greatly increase the flexibility for users to dedupe to their needs.
jackThom
Posts: 16
Joined: Tue May 27, 2014 7:21 am

Re: Delete all but one dupe within each file path location

Post by jackThom »

Well this is disappointing. :(

I would have thought there'd at least be some kind of response by now. I suppose I can just go with a semi-manual algorithm I worked out to accomplish this, but you'd think there was a functional way to do it. (If anyone else is looking trying to do the same thing, let me know and I'll post the process.)

It occurred to me after posting originally that another way to accomplish what I'm after is a "scan only against self" option. With that, you could just select all the folders you want to dedup, drag them to the search path area, and scan them only against themselves, resulting in a list of duplicates only within each specific folder...which is the content I want to delete.

Is that an option?
User avatar
therube
Posts: 615
Joined: Tue Jun 28, 2011 4:38 pm

Re: Delete all but one dupe within each file path location

Post by therube »

I'd thought about that, but it didn't sound like what you were after, as it would not delete the duplicates within a particular --File Folder:

--File Folder 1
---file1.jpg
---file1-duplicate.jpg
---file1-duplicate.jpg
---[file2.jpg, file3.jpg...]

My only thought was to scan each File Folder individually, first, if that is feasible.
That would wipe out all dups in a File Folder.

Then you could expand it to the Subfolders.
At that point, scan against self, might be a better fit.
jackThom
Posts: 16
Joined: Tue May 27, 2014 7:21 am

Re: Delete all but one dupe within each file path location

Post by jackThom »

Oh yeah I meant go into each Subfolder, and just do a select-all of the File Folders, and then drag them to the "search paths" area. You would do this for each Subfolder, and then when every existing File Folder is in the search path, run a single "scan only against self" on all those File Folders at once.

That would accomplish what I'm after. But you're right, it would still not be a full solution, as it would require a little bit of manual work in actually navigating into each Subfolder one-by-one, and highlighting everything, and dragging it to the scan window. Definitely not pretty, but manageable, even if you have a couple hundred Subfolders.

Ideally Duplicate Cleaner would have both of these options:

-"scan only against self" for file paths

-In the Duplicate Files selection assistant, under "Mark" options, an additional two options for each category...adding "in each path" Mark options to the currently existing "in each group" options.
jackThom
Posts: 16
Joined: Tue May 27, 2014 7:21 am

Re: Delete all but one dupe within each file path location

Post by jackThom »

But to respond to your idea, yeah it crossed my mind, but I simply have too many File Folders to make that feasible. If I had a "scan only against self", I could manage going into each Subfolder and dragging them all to the search paths window and doing a single scan on all of them at once.

But running a scan and deduplicating within every single File Folder one-at-a-time would take days or weeks.
dcoberlin
Posts: 8
Joined: Tue Jun 17, 2014 2:42 am

Re: Delete all but one dupe within each file path location

Post by dcoberlin »

The duplicate file program I previously used could display files in an order of file path descending or ascending as well as date and time ascending or descending , also size ascending or descending. One could then " select all duplicates" which would select all those which were duplicates in the chosen order leaving one un-duplicated un-selected of each file.
Perhaps this is doable in Duplicate Cleaner. Does anyone know how to accomplish this?
jackThom
Posts: 16
Joined: Tue May 27, 2014 7:21 am

Re: Delete all but one dupe within each file path location

Post by jackThom »

@dcoberlin,

It took me a revisit to my dupes to get re-familiarized with Duplicate Cleaner and realize that you basically can accomplish what you're talking about with DC. There's actually a couple of different ways to do it, too.

In fact, in a way, what you're describing is partially the process that I ended up creating to accomplish what I was trying to do. As I mentioned before, it's basically just a grungy hack, relying mostly on the mark by "shortest/longest path name in each group," (as that's the only available parameter that deals with path name). It's not pretty, but it gets the job done.

Like I said before, if anyone is interested or is looking to accomplish the same thing, lemme know and I can post the steps.
Post Reply