by LGS on Thu Jul 21, 2016 3:12 am
Hello Tobias, thanks for your quick reply
Concerning the Synchronization criteria : I would suggest several modes , from strong match (file name, extension ,size, date, exact content (byte-to byte) to weak match (file name + size, or even file name alone), at the choice (and risks) of the user. Some fuzzy match might also exist (similar images, or same image but portrait instead of landscape) but course this last option might lead to much more complication in the programming algorithms
Concerning the byte-to-byte comparison method, in matter of speed, maybe would there be a more clever way than just linearly compare the two files from the beginning.
I generally use two search modes : either I search duplicates in one single tree (or several trees considered as a hole), or I compare two distincts trees, one being considered as “master”. (some filters on file extensions and exclusions might apply, such as in syncovery)
The program shows the duplicates list, and either I select manually the ones to remove (sorting on the path will put one tree before the other), either I use some rules to let the program automatically select the duplicates to be removed (newer ones, those with longer file name, longer size, or any combination of these criteria). Some security feature can prevent all duplicates to be marked by error in a group, some other feature might ensure no group is left unmarked)
The syncovery two sides presentation might be useful when a master file set is used.
I think you could find some inspiration by looking at what “reasonable noclone” does. Concept and presentation are interesting, but the reliability has lacks when it comes to big number of files, and it is sometimes incomprehensibly slow, especially wfor people used to the usually blazing fast process of syncovery.
Hope things are a bit clearer. Never hesitate to come back with more questions
Louis