-
Notifications
You must be signed in to change notification settings - Fork 842
Description
Describe the bug
Hi @Wauplin
As discussed on Slack (https://huggingface.slack.com/archives/C039P47V1L5/p1680688264050959),
It turns out that in some cases when opening a HF PR it broke accents. Example: https://huggingface.co/mideind/nmt-doc-is-en-2022-10/discussions/1 + https://huggingface.co/Wauplin/test_encoding/discussions/5/files.
The problem persists when upgrading the library version (from 0.11.0 to 0.13.3).
The problem does not seem to come from the card either, since the accents are correctly displayed:
str(card)
'---\nlanguage:\n- is\n- en\n- multilingual\ntags:\n- translation\ninference:\n parameters:\n src_lang: is_IS\n tgt_lang: en_XX\n decoder_start_token_id: 2\n max_length: 512\nwidget:\n- text: Einu sinni átti ég hest. Hann var svartur og hvítur.\nhuggingface_hub: 0.13.3\n---\n\n# mBART based translation model\nThis model was trained to translate multiple sentences at once, compared to one sentence at a time.\n\nIt will occasionally combine sentences or add an extra sentence.\n\nThis is the same model as are provided on CLARIN: [https://repository.clarin.is/repository/xmlui/handle/20.500.12537/278\n'](https://repository.clarin.is/repository/xmlui/handle/20.500.12537/278/n')
Below the requested information, do not hesitate to let me know if you need more information to help you find a solution.
Reproduction
from huggingface_hub import ModelCard, version
card = ModelCard.load("mideind/nmt-doc-is-en-2022-10")
card.data.huggingface_hub = version
card.push_to_hub(
repo_id="Wauplin/test_encoding",
create_pr=True,
commit_message=f"Update ModelCard using huggingface_hub {version}",
)
Note: this bug was found in January during a wave of adding "multilingual" in language tag to multilingual models. The code was then exactly the same as before (the repo name was obviously different), just had the extra line before the push_to__hub one: card.data.language = card.data.language + ["multilingual"]
Logs
See https://huggingface.co/Wauplin/test_encoding/discussions
System info
- huggingface_hub version: 0.13.3
- Platform: Windows-10-10.0.18362-SP0
- Python version: 3.8.5
- Running in iPython ?: Yes
- iPython shell: ZMQInteractiveShell
- Running in notebook ?: Yes
- Running in Google Colab ?: No
- Token path ?: C:\Users\lbourdois\.cache\huggingface\token
- Has saved token ?: True
- Who am I ?: lbourdois
- Configured git credential helpers:
- FastAI: N/A
- Tensorflow: 2.9.0
- Torch: 1.10.0
- Jinja2: 2.11.2
- Graphviz: 0.16
- Pydot: 1.4.2
- Pillow: 9.0.0
- hf_transfer: N/A
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: C:\Users\lbourdois\.cache\huggingface\hub
- HUGGINGFACE_ASSETS_CACHE: C:\Users\lbourdois\.cache\huggingface\assets
- HF_TOKEN_PATH: C:\Users\lbourdois\.cache\huggingface\token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False