Skip to content

HfApi.dataset_info is case insensitive  #1453

@polinaeterna

Description

@polinaeterna

Describe the bug

HfApi.dataset_info gives the same info for names cased differently which leads to returning info for datasets that do not exist.

For example, HfApi.dataset_info('MBZUAI/Bactrian-X') and HfApi.dataset_info('mbzuai/bactrian-x') give the same info while the second dataset mbzuai/bactrian-x doesn't even exist. This leads to weird behavior in the datasets library (datasets are trying to be loaded with different loaders while the second one shouldn't be loaded at all).

Reproduction

from huggingface_hub import HfApi

api = HfApi("https://huggingface.co")

info1 = api.dataset_info('MBZUAI/Bactrian-X')
info2 = api.dataset_info('mbzuai/bactrian-x')
info3 = api.dataset_info('MbZuAi/bactrian-X')  # any random casing

info1.id == info2.id == info3.id
>> True
info1.sha == info2.sha == info3.sha
>> True

Logs

No response

System info

- huggingface_hub version: 0.14.1
- Platform: Linux-5.14.0-1059-oem-x86_64-with-glibc2.31
- Python version: 3.9.16
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/polina/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: store
- FastAI: N/A
- Tensorflow: 2.11.0
- Torch: 1.13.1
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
- Pillow: 9.4.0
- hf_transfer: N/A
- gradio: N/A
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /home/polina/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /home/polina/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/polina/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions