Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable uploading somehow #2755

Open
colindean opened this issue Jan 16, 2025 · 1 comment
Open

Disable uploading somehow #2755

colindean opened this issue Jan 16, 2025 · 1 comment

Comments

@colindean
Copy link

Is your feature request related to a problem? Please describe.

My team is concerned about accidentally uploading our (internal) models to the Hub. It's not a matter of if, but when. We're looking for ways to make it harder for people to accidentally do things that could be laborious to clean up at minimum or constitute a breach of company information security.

Describe the solution you'd like

Having little familiarity with the codebase, perhaps an environment variable would suffice.

Based on a really quick review of the codebase, I think these are the functions/methods that would need a check of that envvar, as early as possible so no action is taken:

def upload_large_folder_internal(

def __init__(self, args: Namespace) -> None:

def __init__(self, args: Namespace) -> None:

def upload_folder(

Describe alternatives you've considered

A config file. It seems that the huggingface_hub library/CLI stores a token in ${XDG_CACHE_DIR}/huggingface/token but there are no hits when searching the documentation for .config or variations on ${XDG_CONFIG_DIR}. This feature is not significant enough to warrant the introduction of a config file.

A flag file. Dovetailing off the above alternative, perhaps the presence of a file, e.g. ${XDG_CACHE_DIR/huggingface/disable_uploads would suffice. However, it's a heavier solution than an environment variable for macOS and Linux, our dev envs. Conversely, Windows users might prefer that since setting environment variables in Windows is more complex.

Enterprise Hub. This is not a consideration for our team, as we're consuming public models and have solutions for handling any internally-trained models.

Internal fork with uploading code removed. I don't think anyone wants that maintenance burden 😉

Additional context
Add any other context or screenshots about the feature request here.

@Wauplin
Copy link
Contributor

Wauplin commented Jan 17, 2025

Hi @colindean , the best way to guarantee you won't push data to the Hub accidentally is to use fine-grained tokens. With a fine-grained token, you can precisely define what's the token is able to do. If your only use case is to download models, you can set it up like this:

Image

Here is a link to create such a token: https://huggingface.co/settings/tokens/new?ownUserPermissions=repo.content.read&canReadGatedRepos=true&tokenType=fineGrained.


Regarding your feature request, we won't implement it client-side. As you can see on the "fine-grained tokens" page, the scope of things users might want to enable/disable is just too broad -and ever-changing-.

Security and compliance in companies is a real topic and that's why we've built all sorts of tools for enterprises to configure:

  • fine-grained tokens: companies can force their users to use only fine-grained tokens. They can even activate a guardrail so that new tokens require admin approval.
  • repo visibility: by default, repositories are public. Organizations can decide to make all repositories "private by default" to prevent accidentally uploading confidential resource in the open. Another setting is to authorize only private repos.
  • finally, for Enterprise+ orgs, there is a network security layer to allow admins to define a content access policy. This policy consists in a list of allowed/blocked URLs. You can for instance entirely block the "/commit" API.

All of these security features are built-in the Hub and not client-side. This is more robust as it doesn't depend the library versions, environment variables, configs, etc. of your users. The features I mentioned above are bundled in the Enterprise Hub offering (except the last one which requires to take contact with the sales team).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants