-
Notifications
You must be signed in to change notification settings - Fork 854
Description
Describe the bug
We would like to share models across users. To this end, we configured HF_HUB_CACHE which worked great for a while! However, we started to run into PermissionError related to files in .locks.
The problem seems to be to mixed group permissions for .locks. I'm attaching the artifacts list of this model below, but we see the problem for other models, too. The output of umask is 0002 for all users of the system.
Questions:
- Is setting
HF_HUB_CACHEsufficient for sharing hub cache across users? - If I understand correctly, the lock files should be released after use. However, they are not actually deleted by FileLock which may explain the problem we are facing. The relevant logic seems to be here:
huggingface_hub/src/huggingface_hub/utils/_fixes.py
Lines 115 to 121 in 476fa0b
| try: | |
| return lock.release() | |
| except OSError: | |
| try: | |
| Path(lock_file).unlink() | |
| except OSError: | |
| pass |
A workaround would be to delete the .locks files, but not all users have permissions to do that, and asking each individual user to delete their files is tedious. So I'm curios to hear your thoughts on this scenario. Thanks!
Reproduction
No response
Logs
Here is a full stack trace and a list of the artifacts with permission mismatch.
$ python -c "import transformers; transformers.AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3.1-8B-Instruct')"
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 844, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 676, in get_tokenizer_config
resolved_config_file = cached_file(
^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/transformers/utils/hub.py", line 403, in cached_file
resolved_file = hf_hub_download(
^^^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1232, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1380, in _hf_hub_download_to_cache_dir
with WeakFileLock(lock_path):
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/home/trienes/.conda/envs/test/lib/python3.12/site-packages/huggingface_hub/utils/_fixes.py", line 98, in WeakFileLock
lock.acquire()
File "/home/trienes/.local/lib/python3.12/site-packages/filelock/_api.py", line 295, in acquire
self._acquire()
File "/home/trienes/.local/lib/python3.12/site-packages/filelock/_unix.py", line 42, in _acquire
fd = os.open(self.lock_file, open_flags, self._context.mode)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: '/scratch_shared/ag_seifertg/.cache/huggingface/hub/.locks/models--meta-llama--Meta-Llama-3.1-8B-Instruct/db88166e2bc4c799fd5d1ae643b75e84d03ee70e.lock'
"blob" files get group read-write:
ls -la $HF_HUB_CACHE/models--meta-llama--Meta-Llama-3.1-8B-Instruct/blobs/
total 15672361
drwxrwsr-x 2 trienes ag_seifertg 9 Sep 23 17:11 .
drwxrwsr-x 6 trienes ag_seifertg 6 Sep 30 16:30 ..
-rw-rw-r-- 1 trienes ag_seifertg 4999802720 Sep 23 17:10 09d433f650646834a83c580877bd60c6d1f88f7755305c12576b5c7058f9af15
-rw-rw-r-- 1 trienes ag_seifertg 855 Sep 23 16:55 0bb6fd75b3ad2fe988565929f329945262c2814e
-rw-rw-r-- 1 trienes ag_seifertg 23950 Sep 23 17:08 0fd8120f1c6acddc268ebc2583058efaf699a771
-rw-rw-r-- 1 trienes ag_seifertg 4976698672 Sep 23 17:09 2b1879f356aed350030bb40eb45ad362c89d9891096f79a3ab323d3ba5607668
-rw-rw-r-- 1 trienes ag_seifertg 1168138808 Sep 23 17:10 92ecfe1a2414458b4821ac8c13cf8cb70aed66b5eea8dc5ad9eeb4ff309d6d7b
-rw-rw-r-- 1 trienes ag_seifertg 184 Sep 23 17:11 cc7276afd599de091142c6ed3005faf8a74aa257
-rw-rw-r-- 1 trienes ag_seifertg 4915916176 Sep 23 17:10 fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa
While ".locks" file don't get the same set of permissions.
ls -la $HF_HUB_CACHE/.locks/models--meta-llama--Meta-Llama-3.1-8B-Instruct/
total 96
drwxrwsr-x 2 trienes ag_seifertg 13 Aug 5 15:36 .
drwxrwsr-x 50 trienes ag_seifertg 50 Sep 30 17:10 ..
-rw-rw-r-- 1 trienes ag_seifertg 0 Jul 25 10:32 02ee80b6196926a5ad790a004d9efd6ab1ba6542.lock
-rw-rw-r-- 1 trienes ag_seifertg 0 Jul 25 10:30 09d433f650646834a83c580877bd60c6d1f88f7755305c12576b5c7058f9af15.lock
-rw-rw-r-- 1 trienes ag_seifertg 0 Jul 25 10:29 0bb6fd75b3ad2fe988565929f329945262c2814e.lock
-rw-rw-r-- 1 trienes ag_seifertg 0 Jul 25 10:29 0fd8120f1c6acddc268ebc2583058efaf699a771.lock
-rw-rw-r-- 1 trienes ag_seifertg 0 Jul 25 10:29 2b1879f356aed350030bb40eb45ad362c89d9891096f79a3ab323d3ba5607668.lock
-rw-rw-r-- 1 trienes ag_seifertg 0 Jul 25 10:32 421cda369d1e01e742b01d82e3a39c7cc82a8586.lock
-rw-rw-r-- 1 trienes ag_seifertg 0 Jul 25 10:32 5cc5f00a5b203e90a27a3bd60d1ec393b07971e8.lock
-rw-rw-r-- 1 trienes ag_seifertg 0 Jul 25 10:32 92ecfe1a2414458b4821ac8c13cf8cb70aed66b5eea8dc5ad9eeb4ff309d6d7b.lock
-rw-rw-r-- 1 trienes ag_seifertg 0 Jul 25 10:32 cc7276afd599de091142c6ed3005faf8a74aa257.lock
-rw-r--r-- 1 derzhana ag_seifertg 0 Aug 5 15:36 db88166e2bc4c799fd5d1ae643b75e84d03ee70e.lock
^^^^^^^^^^^^^^ here is the conflicting file
-rw-rw-r-- 1 trienes ag_seifertg 0 Jul 25 10:31 fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
System info
- huggingface_hub version: 0.25.1
- Platform: Linux-4.18.0-425.3.1.el8.x86_64-x86_64-with-glibc2.28
- Python version: 3.12.6
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: /home/trienes/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: jantrienes
- Configured git credential helpers:
- FastAI: N/A
- Tensorflow: N/A
- Torch: N/A
- Jinja2: N/A
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 2.1.1
- pydantic: N/A
- aiohttp: N/A
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /scratch_shared/ag_seifertg/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/trienes/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/trienes/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10