Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use DefaultAzureCredential by default for Azure paths #497

Open
daviewales opened this issue Jan 22, 2025 · 3 comments
Open

Use DefaultAzureCredential by default for Azure paths #497

daviewales opened this issue Jan 22, 2025 · 3 comments

Comments

@daviewales
Copy link

daviewales commented Jan 22, 2025

Some libraries, such as polars and pandas, have an almost seamless method for interacting with cloud storage paths.

e.g.:

import polars as pl
pl.scan_csv('az://container/path/to/file.csv', storage_options={'account_name': 'mystorageaccount'}).collect()

This is nice, because I don't need to import any other libraries, setup credentials or blob clients, etc.
It automatically finds any available credentials in my local environment, presumably with something like DefaultAzureCredential.
This means that when testing locally, I just need to be authenticated with Azure CLI, and everything just works.
I don't even need to manually specify environment variables.
It also means that I can deploy the same code to the server, and it will automatically find the appropriate environment variables to authenticate as a service principal with AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, etc.

I may have missed something, but it seems that cloudpathlib has not enabled this kind of automatic credential detection with DefaultAzureCredential. Instead, I need to do the following to get an authenticated working CloudPath:

from azure.identity import DefaultAzureCredential
from cloudpathlib import CloudPath, AzureBlobClient

credential = DefaultAzureCredential()
client = AzureBlobClient(account_url="https://mystorageaccount.blob.core.windows.net", credential=credential)

path = CloudPath('az://container/path/to/file.csv', client=client)

Ideally, it would be nice to be able to do the setup automatically.
I'm imagining the following future state:

from cloudpathlib import CloudPath
path = CloudPath('az://container/path/to/file.csv', storage_options={'account_name': 'mystorageaccount'})

(There may be a nicer way to specify the account name. I'm just copying the API from polars and pandas here. I kind of wish that it was standard to include the account name in the path somehow, as passing the account name in separately feels clunky to me. It would be nice if we could use az://mystorageaccount/container/...)

See the documentation for DefaultAzureCredential. (There's a reason it's called Default!):

Note: If you are using fsspec + adlfs, adlfs requires the storage option anon=False to be set to enable DefaultAzureCredential.

For example, when using pandas, you must specify storage_options={'anon': False}.
When using fsspec directly, you need to pass it as follows:

fs = fsspec.filesystem('az', account_name='mystorageaccount', anon=False)

For more details, see:
https://github.com/fsspec/adlfs#setting-credentials

@pjbull
Copy link
Member

pjbull commented Jan 30, 2025

@daviewales Thanks for filing this issue. Agreed it would be great to support DefaultAzureCredential.

Right now the fall-through logic for getting a credential if/when we need one is complex:

https://github.com/drivendataorg/cloudpathlib/blob/master/cloudpathlib/azure/azblobclient.py#L118-L203

I think the major work here is to spec out the various credentialing scenarios and what we should do in each so that we can implement and test the logic effectively and comprehensively.

Is that potentially a contribution you could help with?

@daviewales
Copy link
Author

daviewales commented Jan 30, 2025

I'll see if I get a chance to have a closer look. My initial feeling is that we just need to check if credential is None just before this line:

elif connection_string is not None:

and if so set:

credential = DefaultAzureCredential()

Other things to consider:

  • Does DefaultAzureCredential raise an exception if it can't find a credential? (Need to catch)

@pjbull
Copy link
Member

pjbull commented Jan 30, 2025

Great, thanks. I believe that credentials can be in the connection string, and I'm not sure how that would interact with also calling and passing DefaultAzureCredential.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants