Lm eval fix for datasets based on a loading script #2144

12010486 · 2025-07-14T09:11:31Z

What does this PR do?

Refactor of #2128 after v1.19-release branch got updated.
Copying here the original description, for reference.

Latest version of datasets, is not supporting trust_remote_code anymore, and with that any loading scripts.

As a consequence, when running

PT_HPU_LAZY_MODE=1 HF_DATASETS_TRUST_REMOTE_CODE=true QUANT_CONFIG=/root/optimum-habana/examples/text-generation//quantization_config//maxabs_measure.json  TQDM_DISABLE=1 python3  run_lm_eval.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --warmup 0 --use_hpu_graphs -o test_results_measure.json --bf16 --batch_size 1 --use_kv_cache --trim_logits --attn_softmax_bf16 --bucket_size=128 --bucket_internal --trust_remote_code --tasks hellaswag

If datasets==4.0.0 is downloaded, we get:

`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'hellaswag' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
07/10/2025 09:11:55 - ERROR - datasets.load - `trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'hellaswag' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
README.md: 6.84kB [00:00, 10.9MB/s]
hellaswag.py: 4.36kB [00:00, 8.86MB/s]
Traceback (most recent call last):
  File "/root/optimum-habana/examples/text-generation/run_lm_eval.py", line 384, in <module>
    main()
  File "/root/optimum-habana/examples/text-generation/run_lm_eval.py", line 347, in main
    results = evaluator.simple_evaluate(lm, tasks=args.tasks, limit=args.limit_iters, log_samples=log_samples)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/utils.py", line 422, in _wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/evaluator.py", line 240, in simple_evaluate
    task_dict = get_task_dict(tasks, task_manager)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 619, in get_task_dict
    task_name_from_string_dict = task_manager.load_task_or_group(
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 415, in load_task_or_group
    collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 315, in _load_individual_task_or_group
    return _load_task(task_config, task=name_or_config)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/tasks/__init__.py", line 281, in _load_task
    task_object = ConfigurableTask(config=config)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/api/task.py", line 823, in __init__
    self.download(self.config.dataset_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lm_eval/api/task.py", line 934, in download
    self.dataset = datasets.load_dataset(
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1392, in load_dataset
    builder_instance = load_dataset_builder(
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1132, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1031, in dataset_module_factory
    raise e1 from None
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 989, in dataset_module_factory
    raise RuntimeError(f"Dataset scripts are no longer supported, but found {filename}")
RuntimeError: Dataset scripts are no longer supported, but found hellaswag.py

regisss · 2025-07-21T08:36:58Z

I think this should be done for LM eval only.
What are the datasets where this issue was raised for audio classification, stable diffusion training and text generation?

12010486 · 2025-07-21T14:18:02Z

@regisss, on stable-diffusion there was an issue in ControlNet training, with fusing/fill50k dataset. On text-generation with https://huggingface.co/datasets/JulesBelveze/tldr_news, in audio classification with the dataset you already added in your draft, and superb

12010486 · 2025-07-21T14:21:24Z

I've basically checked the examples that had --trust_remote_code, even if we are not explicitly testing them in CI, and only in case the datasets version was moved already to >=3.0.2

regisss · 2025-07-21T16:16:52Z

After looking more into it, there is another issue with Datasets v4: it relies on torchcodec for audio decoding, which is compatible Torch 2.7 only. So let's keep these changes and we'll update everything once torchcodec is supported on Gaudi.

…huggingface#453) Fix for datasets based on a loading script Co-authored-by: Silvia Colabrese <[email protected]>

Fix for datasets based on a loading script

95988e0

12010486 requested a review from regisss as a code owner July 14, 2025 09:11

12010486 mentioned this pull request Jul 14, 2025

Lm eval fix for datasets based on a loading script #2128

Closed

astachowiczhabana approved these changes Jul 15, 2025

View reviewed changes

astachowiczhabana merged commit 4161cbe into huggingface:v1.19-release Jul 18, 2025
1 check passed

12010486 deleted the trust_remote_unsupported_upd branch July 31, 2025 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lm eval fix for datasets based on a loading script #2144

Lm eval fix for datasets based on a loading script #2144

Uh oh!

12010486 commented Jul 14, 2025

Uh oh!

Uh oh!

regisss commented Jul 21, 2025

Uh oh!

12010486 commented Jul 21, 2025

Uh oh!

12010486 commented Jul 21, 2025

Uh oh!

regisss commented Jul 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Lm eval fix for datasets based on a loading script #2144

Lm eval fix for datasets based on a loading script #2144

Uh oh!

Conversation

12010486 commented Jul 14, 2025

What does this PR do?

Uh oh!

Uh oh!

regisss commented Jul 21, 2025

Uh oh!

12010486 commented Jul 21, 2025

Uh oh!

12010486 commented Jul 21, 2025

Uh oh!

regisss commented Jul 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants