Skip to content

Thread safe when upload files in multiple threads or processes #1422

@narugo1992

Description

@narugo1992

Is your feature request related to a problem? Please describe.
Here are 412 errors when I tried to upload images to dataset in multiple parallel runners: https://github.com/narugo1992/gchar/actions/runs/4599743391/jobs/8125512962#step:14:272

This error is most likely caused by conflicts between multiple submissions that occur simultaneously during multi-threaded uploading.

Describe the solution you'd like
Regarding this issue, the solution I can think of is: when 412 errors are detected due to multi-threaded uploading, automatic retry can be performed (of course, considering that retrying is not always appropriate in every case, this feature can be made optional).

Describe alternatives you've considered
If a specific type of exception can be thrown when such errors occur, and a manual function for refreshing and retrying is provided, then the user can also control whether to retry or not. This solution can also be an alternative to the above-mentioned solution.

Additional context
This scenario is common when dealing with a large number of files with a large overall size, and the data needs to be automatically updated based on the timed execution functionality provided by online platforms such as Github Action. Since online runners cannot provide very large hard disk space, resources can only be generated and uploaded at the same time to reduce the cost of hard disk space. Under this circumstance, if stable concurrent uploading or processing (such as deleting) of data in the dataset can be achieved, it will greatly facilitate the deployment of automatic update functionality for the dataset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions