best practices for handling millions of files

No more questions - please go to http://www.syncovery.com/forum instead

Moderator: SuperFlexible Administrators

best practices for handling millions of files

Postby biodtl on Sun Apr 10, 2016 8:26 am

I have been using the Exact Mirror method for a number of years to backup an ever increasing number of files. Currently there are about 1.5 million files totaling about 150 GB. This is a one-way backup to a remote server destination over SSH. The "Building File List" step is taking an excessive amount of time to complete now. I have altered a few settings and have it down to about 6 hours during that step. Here are some of the relevant job settings:

Version 7.47
Exact Mirror
File List Threads: 50
Cache Destination File List: Disabled
Program Settings - Advanced - Split jobs: Disabled

My requirements for this job are pretty simple. Once per day schedule, one-way copy to remote server. I have separate snapshot images on the remote server for nightly/weekly images. Up to date deletions is not necessarily required. If it would speed things up, I can perform a separate job to check for deletions once a week. New files are added often, hundreds per day. Files rarely change. Deletions are slightly more common but still very seldom. Should I switch this to a Standard Copying method? How about a real-time sync method? If real time prevents the building file list step, it may work better if I can have it run at a 10-20 minute interval. Any other options I should be looking at?
biodtl
 
Posts: 3
Joined: Sun Apr 10, 2016 7:21 am

Re: best practices for handling millions of files

Postby superflexible on Mon Apr 11, 2016 1:55 pm

I think all you need is Cache Destination File List. Since version 7.20, the cache is extremely fast. It can easily scan 1.5 million files in ten minutes or so, depending on the speed of your source storage. The first scan will not be fast but subsequential ones will be. Make sure you remove the checkmarks "Double check the actual destination" as well as "Re-read the destination every X runs".

You can probably not use 50 file listing threads with SFTP/SSH. Maybe 8 or so.

If the destination is a Linux server, you can also try the recursive SSH listing option, which uses an SSH shell command.

If the destination is a Windows server, you can install the Syncovery Remote Service to do the listing.

I think you will be happy with the destination file list cache and you can stick with once per day. Of course you can do real-time in addition to that.

Make sure you don't have any excessive logging chosen on the Logs tab sheet of the Program Settings dialog. Things like "With Timing Info" or "Internet Protocol Logging" can slow it way down.

Please see also
https://www.syncovery.com/documentation/faq/fastlist/
User avatar
superflexible
Site Admin
 
Posts: 2478
Joined: Thu Dec 31, 2009 3:08 pm


Return to Windows Support * new forum: www.syncovery.com/forum

cron