-
-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: improves delete_dir for s3fs-backed FsspecStore #2661
base: main
Are you sure you want to change the base?
Feat: improves delete_dir for s3fs-backed FsspecStore #2661
Conversation
- override Store.delete_dir default method, which deletes keys one by one, to support bulk deletion for fsspec implementations that support a list of paths in the fs._rm method. - This can greatly reduce the number of requests to S3, which reduces likelihood of running into throttling errors and improves delete performance. - Currently, only s3fs is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reservations for this code. It seems to me, that calling ._rm()
should be all that's required, and let fsspec handle everything else.
…shadi/zarr-python into feat-fsspecstore-bulk-delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @carshadi for continued efforts here. I'd like to see some fsspec specific tests here if possible.
Co-authored-by: Joe Hamman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good here!
@carshadi - the only thing this needs is a release note.
@martindurant - care to do the final review and/or merge?
tests/test_store/test_fsspec.py
Outdated
"""The local fs is not async so we should expect it to be wrapped automatically""" | ||
from fsspec.implementations.asyn_wrapper import AsyncFileSystemWrapper | ||
|
||
store = FsspecStore.from_url(tmpdir / "test/path", storage_options={"auto_mkdir": True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tmpdir / "test/path"
This is not a URL. I don't know what from_url demands, but f"{tmpdir}/test/path"
seems better to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used that method since it wraps the filesystem automatically, and figured that would be the entry point most people try first. FWIW, it seems to work with a local path. I could construct the filesystem manually if preferrable, e.g.,
sync_fs = LocalFileSystem(auto_mkdir=True)
async_fs = AsyncFileSystemWrapper(sync_fs)
store = FsspecStore(async_fs, read_only=False, path=f"{tmpdir}/test/path")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to work with a local path
on windows?? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm running Windows 11, so apparently :)
Improves performance of
FsspecStore.delete_dir
when underlyingfs
iss3fs
, by passing a list of filepaths to be removed in bulk instead of one-by-one.Resolves #2659
TODO: