Skip to content

Can't copy files with CommitOperationCopy when there are too many of them #1503

@lhoestq

Description

@lhoestq

On https://huggingface.co/datasets/tiiuae/falcon-refinedweb it raises

Traceback (most recent call last):
File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
 response.raise_for_status()
File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
 raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 413 Client Error: Payload Too Large for url: https://huggingface.co/api/datasets/tiiuae/falcon-refinedweb/paths-info/d4d0c8a489e10bb4fbce947d16811b8b8eb544f5

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/src/services/worker/src/worker/job_manager.py", line 163, in process
 job_result = self.job_runner.compute()
File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 931, in compute
 compute_config_parquet_and_info_response(
File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 867, in compute_config_parquet_and_info_response
 committer_hf_api.create_commit(
File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
 return fn(*args, **kwargs)
File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 828, in _inner
 return fn(self, *args, **kwargs)
File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 2687, in create_commit
 files_to_copy = fetch_lfs_files_to_copy(
File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
 return fn(*args, **kwargs)
File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/_commit_api.py", line 533, in fetch_lfs_files_to_copy
 for src_repo_file in src_repo_files:
File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 2041, in list_files_info
 hf_raise_for_status(response)
File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 301, in hf_raise_for_status
 raise HfHubHTTPError(str(e), response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError: 413 Client Error: Payload Too Large for url: https://huggingface.co/api/datasets/tiiuae/falcon-refinedweb/paths-info/d4d0c8a489e10bb4fbce947d16811b8b8eb544f5 (Request ID: Root=1-6486dcf3-4f1942863e61c6e5073aa26e)
too many parameters

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions