-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Can we record layer_id for DiT model? #9836
Comments
You can assign me if you think it is reasonable. open to discuss. |
thanks for the issue |
@yiyixuxu https://github.com/Zefan-Cai/PyramidKV https://arxiv.org/html/2407.11550v3 these are LLM papers using layerwise feature. I believe that in DiT we can also use this strategies. It reminds me of the quantization model. We can choose different layer use different strategies. In current framework, we need to pass layer_id in pipeline, transformer_block, block, Attention, attention_processor, which is hard to make some simple modifications based on layer_id. |
I think recording extra information in models for inference-only purposes is not a good reason to support them. This information can be easily made available at the end-user level by some simple code like: pipe = ...
transformer = pipe.transformer
for index, block in enumerate(transformer.transformer_blocks):
block.layer_id = index While I agree that it would be super helpful for debugging, these kind of changes introduce extra maintainence efforts and will add another step for model authors when trying to integrate their research contributions (for every kind of additional information we would like to maintain). Maybe we could provide debugging utils that create wrappers on the models with this kind of information instead? If you feel the above idea is not helpful, or have additional thoughts on why this would be impactful to have, I would be happy to hear and help with relevant changes! |
I agree that using a wrapper is a good design approach. I also want to point out something additional - the current modification only adds information to transformer_blocks, but not to the inner attention and MLP components. My thought is that a utility wrapper could automatically add debugging information to all inner blocks at once. Would this approach work for you? This way we can ensure comprehensive debugging information across the entire model hierarchy in a clean and maintainable way. |
I like the idea of debugging utils and being able to inject this information if "debug" mode is enabled. cc @DN6 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Is your feature request related to a problem? Please describe.
Some layerwise algorithm may be based on layer-id.
just need some simple modification for transformer2Dmodel and its inner module like attention part, batch_norm part. just pass the layer_id as an extra parameter.
The text was updated successfully, but these errors were encountered: