Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Can we record layer_id for DiT model? #9836

Closed
foreverpiano opened this issue Nov 1, 2024 · 9 comments
Closed

[Feature] Can we record layer_id for DiT model? #9836

foreverpiano opened this issue Nov 1, 2024 · 9 comments
Labels
stale Issues that haven't received updates

Comments

@foreverpiano
Copy link

foreverpiano commented Nov 1, 2024

Is your feature request related to a problem? Please describe.
Some layerwise algorithm may be based on layer-id.
just need some simple modification for transformer2Dmodel and its inner module like attention part, batch_norm part. just pass the layer_id as an extra parameter.

@foreverpiano foreverpiano changed the title Can we record layer_id for DiT model? [Feature] Can we record layer_id for DiT model? Nov 1, 2024
@foreverpiano
Copy link
Author

You can assign me if you think it is reasonable. open to discuss.

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Nov 1, 2024

thanks for the issue
I think some examples on how the layer_id is needed and used would be helpful:)

@foreverpiano
Copy link
Author

foreverpiano commented Nov 2, 2024

@yiyixuxu https://github.com/Zefan-Cai/PyramidKV https://arxiv.org/html/2407.11550v3 these are LLM papers using layerwise feature. I believe that in DiT we can also use this strategies. It reminds me of the quantization model. We can choose different layer use different strategies. In current framework, we need to pass layer_id in pipeline, transformer_block, block, Attention, attention_processor, which is hard to make some simple modifications based on layer_id.

@foreverpiano
Copy link
Author

@yiyixuxu #9177 this also requires layerwise information. So I think that it is a trend to include layer_id.

@foreverpiano
Copy link
Author

And for debugging practice in #9508 and #9329, layer_id also helps with debugging.

@a-r-r-o-w
Copy link
Member

I think recording extra information in models for inference-only purposes is not a good reason to support them. This information can be easily made available at the end-user level by some simple code like:

pipe = ...
transformer = pipe.transformer

for index, block in enumerate(transformer.transformer_blocks):
   block.layer_id = index

While I agree that it would be super helpful for debugging, these kind of changes introduce extra maintainence efforts and will add another step for model authors when trying to integrate their research contributions (for every kind of additional information we would like to maintain). Maybe we could provide debugging utils that create wrappers on the models with this kind of information instead? If you feel the above idea is not helpful, or have additional thoughts on why this would be impactful to have, I would be happy to hear and help with relevant changes!

@foreverpiano
Copy link
Author

I agree that using a wrapper is a good design approach. I also want to point out something additional - the current modification only adds information to transformer_blocks, but not to the inner attention and MLP components. My thought is that a utility wrapper could automatically add debugging information to all inner blocks at once. Would this approach work for you? This way we can ensure comprehensive debugging information across the entire model hierarchy in a clean and maintainable way.

@a-r-r-o-w
Copy link
Member

I like the idea of debugging utils and being able to inject this information if "debug" mode is enabled. cc @DN6

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that haven't received updates
Projects
None yet
Development

No branches or pull requests

3 participants