duplicate files removing feature ???

No more questions - please go to http://www.syncovery.com/forum instead

Moderator: SuperFlexible Administrators

duplicate files removing feature ???

Postby LGS on Tue Jul 19, 2016 8:26 am

May I kindly suggest to add a "duplicate files removing feature" in Syncovery ? (either filename+size criteria or full byte-to-byte comparison)

I currently use "reasonable noclone", but this tool (as othe ones) are not that reliable for large file numbers, and poorly user friendly for some aspects.

I really miss the robustness, speed, flexibility and ease of use of syncovery when I have to process my duplicates with other tools.

I am quite sure that just few things would need to be added to the syncovery engine and menus to cover these specific "duplicate cleaning" tasks.

Thanks in advance for your reply!
Best regards
L.
LGS
 
Posts: 11
Joined: Wed Jan 15, 2014 7:19 am

Re: duplicate files removing feature ???

Postby superflexible on Wed Jul 20, 2016 6:42 pm

It might be possible to add. But what would be the conditions to detect duplicates, and what actions should be taken? Which settings / options do you think would be necessary? Could the feature be part of sync jobs with two "sides" (left and right), or just a standalone feature operating only on one side?
User avatar
superflexible
Site Admin
 
Posts: 2478
Joined: Thu Dec 31, 2009 3:08 pm

Re: duplicate files removing feature ???

Postby LGS on Thu Jul 21, 2016 3:12 am

Hello Tobias, thanks for your quick reply
Concerning the Synchronization criteria : I would suggest several modes , from strong match (file name, extension ,size, date, exact content (byte-to byte) to weak match (file name + size, or even file name alone), at the choice (and risks) of the user. Some fuzzy match might also exist (similar images, or same image but portrait instead of landscape) but course this last option might lead to much more complication in the programming algorithms
Concerning the byte-to-byte comparison method, in matter of speed, maybe would there be a more clever way than just linearly compare the two files from the beginning.
I generally use two search modes : either I search duplicates in one single tree (or several trees considered as a hole), or I compare two distincts trees, one being considered as “master”. (some filters on file extensions and exclusions might apply, such as in syncovery)
The program shows the duplicates list, and either I select manually the ones to remove (sorting on the path will put one tree before the other), either I use some rules to let the program automatically select the duplicates to be removed (newer ones, those with longer file name, longer size, or any combination of these criteria). Some security feature can prevent all duplicates to be marked by error in a group, some other feature might ensure no group is left unmarked)
The syncovery two sides presentation might be useful when a master file set is used.
I think you could find some inspiration by looking at what “reasonable noclone” does. Concept and presentation are interesting, but the reliability has lacks when it comes to big number of files, and it is sometimes incomprehensibly slow, especially wfor people used to the usually blazing fast process of syncovery.
Hope things are a bit clearer. Never hesitate to come back with more questions

Louis
LGS
 
Posts: 11
Joined: Wed Jan 15, 2014 7:19 am

Re: duplicate files removing feature ???

Postby superflexible on Tue Jul 26, 2016 8:01 am

Many thanks for these details! I will consider it for a future Syncovery release.
User avatar
superflexible
Site Admin
 
Posts: 2478
Joined: Thu Dec 31, 2009 3:08 pm

Re: duplicate files removing feature ???

Postby wizard-ict on Fri Aug 05, 2016 8:13 am

Hi

I would like to weigh in on this subject if I may. I would like to see this deduplication on the right side only during sync operations (i.e. deduplicate the backup but leave the original data alone). I'm sure others would like both sides etc, so that would probably be optional.

Ideally I would like a hard link created (where supported by the file system) or something similar, so that the file is still referenced but is just a link to the other copy. This would also aid subsequent jobs to be able to reference the file and not have to upload it again every run thinking it's a new file.


This is probably a more useful feature for cloud storage (particularly where the storage is charged per GB) than local storage where TB's of storage are available for low cost. Of course Cloud storage often doesn't support hard links, so maybe a proprietary file type to reference the other file is required.
wizard-ict
 
Posts: 7
Joined: Wed Feb 10, 2016 10:16 am


Return to Windows Support * new forum: www.syncovery.com/forum

cron