Skip to content

Feat: Add ONNX export support for LightOn OCR models#129

Open
remidesbois1 wants to merge 6 commits into
huggingface:mainfrom
remidesbois1:feat/lighton-ocr-support
Open

Feat: Add ONNX export support for LightOn OCR models#129
remidesbois1 wants to merge 6 commits into
huggingface:mainfrom
remidesbois1:feat/lighton-ocr-support

Conversation

@remidesbois1
Copy link
Copy Markdown

Note: This PR replaces and supersedes #128, which was an experimental draft. This PR provides the clean, final implementation.

This PR adds full ONNX export support for LightOn OCR Vision-Language Models (e.g., lightonai/LightOnOCR-2-1B) to optimum.

The export pipeline correctly splits the model into 3 dedicated ONNX sub-components: vision_encoder, embed_tokens, decoder_merged.

Since the official models on the Hub use "mistral3" as their model_type in config.json, LightonOcrOnnxConfig is natively registered to handle the mistral3 architecture for image-text-to-text tasks. (This might not be the correct way to do this so I'm open to suggestion).

Enforce FORCE_ONNX_EXTERNAL_DATA="1" strictly during the merge_decoders step of the config to bypass the 2GB Protobuf limit.

Testing: Added lighton_ocr to the CI test suite with a tiny dummy model.

Note: The export completes successfully and the output files are perfectly valid, but the validation step fails with a ShapeError on the present keys (e.g. (2, 2, 32, 8) vs (2, 2, 16, 8)). Since the logits are accurate, this appears to be a false positive related to how DynamicCache is returned vs ONNX. Let me know if there's a preferred way to handle this validation check!

Registers lighton_ocr as a model type and exports it as three separate
ONNX files: vision_encoder (ViT + projector), embed_tokens (embedding
table), and decoder_model_merged (language model with merged KV cache
support). Handles weight key remapping from lighton_ocr to Mistral3
internals and works around the >2GB protobuf limit during decoder merge.
@remidesbois1 remidesbois1 marked this pull request as draft April 28, 2026 13:29
@remidesbois1 remidesbois1 marked this pull request as ready for review April 28, 2026 13:50
@remidesbois1
Copy link
Copy Markdown
Author

Note on Dynamo :
I've experimented with the new --dynamo exporter. While the vision_encoder and embed_tokens components export correctly with adjusted dynamic axes, the decoder_with_past currently fails. This appears to be due to convert_dynamic_axes_into_dynamic_shapes not yet handling the nested tuple structure of past_key_values (failing the sum(_ is not None for _ in v) == 1 check). For now, this PR relies on the standard TorchScript-based export which is fully functional. If you have any idea how to implement it I'll be happy to learn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant