Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glob is not reporting paths when a folder doesn't contain objects #935

Open
NicholasFiorentini opened this issue Jan 31, 2025 · 2 comments

Comments

@NicholasFiorentini
Copy link

Given the following S3 bucket structure:

root_bucket/
       something/
           target/
               another_folder/
                   object.txt
       target/
           object.txt
           another_folder/

Given the glob pattern s3://root_bucket/**/target, I expect to obtain s3://root_bucket/something/target and s3://root_bucket/target. However, only s3://root_bucket/target is returned.

This happens in AWS S3, and it is easily reproducible with moto3:

from typing import Generator, NamedTuple

import pytest
import s3fs


class Fixture(NamedTuple):
    base_url: str
    mock_s3_client: s3fs.S3FileSystem


@pytest.fixture(scope="function")
def build(mock_s3_client: s3fs.S3FileSystem) -> Generator[Fixture, None, None]:
    # setup
    base_url = "s3://mock-bucket/fake-project"
    mock_s3_client.mkdir(base_url)

    # First subfolder, with no objects after target.
    mock_s3_client.mkdir(f"{base_url}/something")
    mock_s3_client.mkdir(f"{base_url}/something/target")
    mock_s3_client.mkdir(f"{base_url}/something/target/folder1")
    mock_s3_client.touch(f"{base_url}/something/target/folder1/file.xml")
    mock_s3_client.mkdir(f"{base_url}/something/target/folder2")
    mock_s3_client.touch(f"{base_url}/something/target/folder2/file.xml")

    # Second subfolder, with objects after target.
    mock_s3_client.mkdir(f"{base_url}/target")
    mock_s3_client.touch(f"{base_url}/target/example.txt")
    mock_s3_client.mkdir(f"{base_url}/target/folder3")
    mock_s3_client.touch(f"{base_url}/target/folder3/file.xml")
    mock_s3_client.mkdir(f"{base_url}/target/folder4")
    mock_s3_client.touch(f"{base_url}/target/folder4/file.xml")

    # verify folder structure
    assert set(mock_s3_client.ls(base_url)) == {
        "mock-bucket/fake-project/something",
        "mock-bucket/fake-project/target",
    }

    # run
    yield Fixture(
        base_url=base_url,
        mock_s3_client=mock_s3_client,
    )


def test_find_subfolders(build: Fixture) -> None:
    glob_pattern = f"{build.base_url}/**/target"
    result = build.mock_s3_client.glob(glob_pattern)

    assert len(result) == 2
    assert set(result) == {
        "s3://mock-bucket/fake-project/something/target",
        "s3://mock-bucket/fake-project/target",
    }
@martindurant
Copy link
Member

Thanks for posting.

Perhaps it might be useful to phrase this as a test case in a PR, so that we can then get on with fixing it.

mock_s3_client.mkdir(f"{base_url}/target/folder3")

I should note that mkdir doesn't do anything, because S3 does not have folders. In some tools a folder is equivalent to a zero-length-file with the same name plus "/" suffix; however the API creates implicit folders when they contain something, and this is the convention that s3fs uses.

@NicholasFiorentini
Copy link
Author

Thanks for posting.

Perhaps it might be useful to phrase this as a test case in a PR, so that we can then get on with fixing it.

mock_s3_client.mkdir(f"{base_url}/target/folder3")

Thanks for the suggestion. I cannot promise a PR anytime soon due to time constraints on my current projects.

I should note that mkdir doesn't do anything, because S3 does not have folders. In some tools a folder is equivalent to a zero-length-file with the same name plus "/" suffix; however the API creates implicit folders when they contain something, and this is the convention that s3fs uses.

Yes, working with "folders-like" objects in S3 is always tricky. However, I behave similarly when using a real AWS S3 bucket with the "folders" created manually from UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants