Skip to content

Fix path traversal when loading OneFormer image processor metadata#46270

Closed
LinZiyuu wants to merge 1 commit into
huggingface:mainfrom
LinZiyuu:fix-oneformer-metadata-path-traversal
Closed

Fix path traversal when loading OneFormer image processor metadata#46270
LinZiyuu wants to merge 1 commit into
huggingface:mainfrom
LinZiyuu:fix-oneformer-metadata-path-traversal

Conversation

@LinZiyuu
Copy link
Copy Markdown
Contributor

@LinZiyuu LinZiyuu commented May 28, 2026

The class_info_file and repo_path fields of the OneFormer image processor config are loaded verbatim from preprocessor_config.json by from_pretrained, but they are untrusted. class_info_file is documented to live inside repo_path, yet it flows straight into os.path.join(repo_path, class_info_file)open(...)json.load in load_metadata, which runs during the processor's __init__. A value like "../../secret.json" or an absolute path escapes repo_path, so loading a malicious model via AutoImageProcessor.from_pretrained(...) (no trust_remote_code) reads an arbitrary local JSON file off the victim's machine.

The fix verifies the resolved metadata path stays inside repo_path before reading it, allowing files in subdirectories but rejecting ../absolute escapes. Applied to both the torchvision and PIL backends, with regression tests for the escape and the allowed-subdirectory case.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

The `class_info_file` and `repo_path` fields of the OneFormer image
processor config are loaded verbatim from `preprocessor_config.json` by
`from_pretrained`, but they are untrusted. `class_info_file` is documented
to live inside `repo_path`, yet it flows straight into
`os.path.join(repo_path, class_info_file)` -> `open(...)` -> `json.load`
in `load_metadata`, which runs during the processor's `__init__`. A value
like "../../secret.json" or an absolute path escapes `repo_path`, so
loading a malicious model via `AutoImageProcessor.from_pretrained(...)`
(no `trust_remote_code`) reads an arbitrary local JSON file off the
victim's machine. This is a sibling of the Bark (huggingface#46237) and chat-template
(huggingface#46191) path traversals.

Verify the resolved metadata path stays inside `repo_path` before reading
it, allowing files in subdirectories but rejecting `..`/absolute escapes.
Applied to both the torchvision and PIL backends, with regression tests
for the escape and the allowed-subdirectory case.
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: oneformer

@molbap
Copy link
Copy Markdown
Contributor

molbap commented May 29, 2026

This doesn’t seem like a real security boundary: both repo_path and class_info_file come from the same untrusted processor config, so constraining one relative to the other does not prevent a malicious config from choosing a broader local base path. I don't see how an attacker benefits from this before having already critical access? Closing for now, feel free to reopen if there's a misunderstanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants