Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Exporting Specific Sub-Modules (e.g., Encoder, Decoder) #2148

Open
happyme531 opened this issue Jan 3, 2025 · 0 comments
Open

Comments

@happyme531
Copy link

Feature request

Currently, when converting transformer models (like T5, but potentially others) to ONNX using the Optimum library, it appears to generate a single ONNX file encompassing the entire model architecture (both encoder and decoder). This occurs regardless of the specific task option selected during conversion.

optimum-cli export onnx --model . . --task text-classification
optimum-cli export onnx --model . . --task feature-extraction 

I propose a feature that provides users with more granular control over the ONNX export process. Specifically, this feature should allow users to selectively export specific sub-modules of a transformer model, such as:

  • Only the encoder
  • Only the decoder
  • Potentially other distinct components of the model

This enhancement would enable users to optimize ONNX models for specific use cases where only a portion of the full model is required.
Evidence of the feasibility and need for this is the existence of separately exported encoder and decoder ONNX models for various transformer architectures on Hugging Face:

Motivation

I am encountering a limitation with the current ONNX export functionality in Optimum. When converting transformer models, the resulting ONNX file invariably includes the entire model, even when I only require a specific part, like the encoder.

This is frustrating because:

  • Increased Model Size: The generated ONNX model is larger than necessary, consuming more storage and potentially impacting loading times.
  • Performance Overhead: When deploying the ONNX model for tasks that only utilize a specific sub-module (e.g., using only the encoder for embedding generation), the presence of the unnecessary decoder can introduce performance overhead.
  • Lack of Flexibility: The current approach lacks the flexibility to tailor the exported ONNX model to specific application needs.

As observed on Hugging Face, users have successfully exported individual components (like encoders and decoders) of various transformer models to ONNX. This indicates that it's technically possible and a desirable workflow. The Optimum library should provide a more direct and user-friendly way to achieve this without requiring manual workarounds.

Your contribution

While my direct expertise in the internal workings of the Optimum library for ONNX export is limited, I am willing to contribute by:

  • Testing: Thoroughly testing any implementation of this feature on various transformer models.
  • Providing Feedback: Offering detailed feedback on the usability and effectiveness of the new feature.
  • Sharing Use Cases: Providing specific use cases and examples that highlight the benefits of this functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant