Skip to content

[HfFileSystem] Reuse caching when downloading a file #1452

@Wauplin

Description

@Wauplin

In hffs, we implement _fetch_range which allows to retrieve bytes from a remote file without downloading it entirely (see fsspec). This is nice when downloading only parts of a file but it we want to download it entirely, it would be best to benefit from the existing hf_hub_download than using the HF cache system.

@mariosasko @lhoestq given your knowledge of fsspec, do you think it would be possible to overwrite the read method so that if read is called with length=-1, then we cache the entire file and read it from disk? And if length!=-1 we default back to the normal implementation. Do you see any weird side effect that this could cause?

Also for _fetch_range instead of always fetching from remote, we could try to find the file locally first.


Otherwise I saw that they also define a BaseCache object that we could extend. To you think it's worth trying to tweak it to use our existing cache?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions