You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
My team is concerned about accidentally uploading our (internal) models to the Hub. It's not a matter of if, but when. We're looking for ways to make it harder for people to accidentally do things that could be laborious to clean up at minimum or constitute a breach of company information security.
Describe the solution you'd like
Having little familiarity with the codebase, perhaps an environment variable would suffice.
Based on a really quick review of the codebase, I think these are the functions/methods that would need a check of that envvar, as early as possible so no action is taken:
A config file. It seems that the huggingface_hub library/CLI stores a token in ${XDG_CACHE_DIR}/huggingface/token but there are no hits when searching the documentation for .config or variations on ${XDG_CONFIG_DIR}. This feature is not significant enough to warrant the introduction of a config file.
A flag file. Dovetailing off the above alternative, perhaps the presence of a file, e.g. ${XDG_CACHE_DIR/huggingface/disable_uploads would suffice. However, it's a heavier solution than an environment variable for macOS and Linux, our dev envs. Conversely, Windows users might prefer that since setting environment variables in Windows is more complex.
Enterprise Hub. This is not a consideration for our team, as we're consuming public models and have solutions for handling any internally-trained models.
Internal fork with uploading code removed. I don't think anyone wants that maintenance burden 😉
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
Hi @colindean , the best way to guarantee you won't push data to the Hub accidentally is to use fine-grained tokens. With a fine-grained token, you can precisely define what's the token is able to do. If your only use case is to download models, you can set it up like this:
Regarding your feature request, we won't implement it client-side. As you can see on the "fine-grained tokens" page, the scope of things users might want to enable/disable is just too broad -and ever-changing-.
Security and compliance in companies is a real topic and that's why we've built all sorts of tools for enterprises to configure:
fine-grained tokens: companies can force their users to use only fine-grained tokens. They can even activate a guardrail so that new tokens require admin approval.
repo visibility: by default, repositories are public. Organizations can decide to make all repositories "private by default" to prevent accidentally uploading confidential resource in the open. Another setting is to authorize only private repos.
finally, for Enterprise+ orgs, there is a network security layer to allow admins to define a content access policy. This policy consists in a list of allowed/blocked URLs. You can for instance entirely block the "/commit" API.
All of these security features are built-in the Hub and not client-side. This is more robust as it doesn't depend the library versions, environment variables, configs, etc. of your users. The features I mentioned above are bundled in the Enterprise Hub offering (except the last one which requires to take contact with the sales team).
Is your feature request related to a problem? Please describe.
My team is concerned about accidentally uploading our (internal) models to the Hub. It's not a matter of if, but when. We're looking for ways to make it harder for people to accidentally do things that could be laborious to clean up at minimum or constitute a breach of company information security.
Describe the solution you'd like
Having little familiarity with the codebase, perhaps an environment variable would suffice.
Based on a really quick review of the codebase, I think these are the functions/methods that would need a check of that envvar, as early as possible so no action is taken:
huggingface_hub/src/huggingface_hub/_upload_large_folder.py
Line 48 in 7123262
huggingface_hub/src/huggingface_hub/commands/upload.py
Line 133 in 7123262
huggingface_hub/src/huggingface_hub/commands/upload_large_folder.py
Line 69 in 7123262
huggingface_hub/src/huggingface_hub/hf_api.py
Line 4268 in 7123262
huggingface_hub/src/huggingface_hub/hf_api.py
Line 4476 in 7123262
Describe alternatives you've considered
A config file. It seems that the
huggingface_hub
library/CLI stores a token in${XDG_CACHE_DIR}/huggingface/token
but there are no hits when searching the documentation for.config
or variations on${XDG_CONFIG_DIR}
. This feature is not significant enough to warrant the introduction of a config file.A flag file. Dovetailing off the above alternative, perhaps the presence of a file, e.g.
${XDG_CACHE_DIR/huggingface/disable_uploads
would suffice. However, it's a heavier solution than an environment variable for macOS and Linux, our dev envs. Conversely, Windows users might prefer that since setting environment variables in Windows is more complex.Enterprise Hub. This is not a consideration for our team, as we're consuming public models and have solutions for handling any internally-trained models.
Internal fork with uploading code removed. I don't think anyone wants that maintenance burden 😉
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: