-
Notifications
You must be signed in to change notification settings - Fork 29.2k
Add detailed ConvBERT model card with usage, architecture, and refere… #38470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi! This is my first contribution — just checking in kindly for a review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a nice start but I think you should revisit the template in the issue to make sure your model card follows the same format and language! As an example, take a look at the BERT docs :)
@@ -0,0 +1,125 @@ | |||
<!-- ConvBERT model card --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to remove this
<!-- ConvBERT model card --> | |
<!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
rendered properly in your Markdown viewer. | |
--> |
<div style="float: right;"> | ||
<div class="flex flex-wrap space-x-1"> | ||
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white"> | ||
</div> | ||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing the TensorFlow badge and this should go above # ConvBERT
--- | ||
|
||
## Model Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--- | |
## Model Overview |
|
||
## Model Overview | ||
|
||
ConvBERT is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConvBERT is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost. | |
[ConvBERT](https://huggingface.co/papers/2008.02496) incorporates a mixed attention block that makes it more efficient than [BERT](./bert). Attention is costly because it models global word relationships. This is inefficient because some heads only learn local word relationships. ConvBERT replaces some of the attention heads with a convolution head to handle this. The result of this new mixed attention design is a more lightweight model with lower training costs without compromising performance. | |
Instead of using attention heads everywhere to model , ConvBERT also includes convolution heads to model local word relationships. | |
is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost. |
What does this PR do?
This PR adds a detailed and standardized model card for ConvBERT to improve Hugging Face Transformers documentation.
Includes:
pipeline
,AutoModel
, and CLIFile added:
src/transformers/models/convbert/modelcard.md
This contribution helps improve model discoverability and provides users with accessible and actionable information about ConvBERT.
cc: @stevhliu (documentation reviewer)
Fixes: N/A