You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the big difference between the two versions, and I'm still not sure which is the 'better' option.
Using a multiprocessing pool to create workers who use the raw swift library calls might be more efficient, but not as 'safe'. We could reuse the code from before the PR.
Using subcommand to fire off batches of 'swift upload...' shell commands is less efficient, but 'safer'.
The latest PR was a big downgrade, in terms of speed. I definitely want to fix that.
While doing the large batch upload for #9, it's become pretty clear that a single thread calling 'swift upload' millions of times isn't going to cut it. Way too slow. Even if we don't call the library directly, and continue to use subcommand, we'll need a multiprocess pool to call 'swift upload' in parallel. We can use existing code in bulkupload.py.
On my test VM, uploading 1,000,000 1-2 kb files to a container using swift upload took 838m28.226s. We should shoot for performance in this tool to fall within 110% of that.
For directories with 20+ million files, multiple processes will be necessary to complete the upload within a reasonable amount of time.
The text was updated successfully, but these errors were encountered: