Skip to content

Add detailed ConvBERT model card with usage, architecture, and refere… #38470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Aesha19
Copy link

@Aesha19 Aesha19 commented May 29, 2025

What does this PR do?

This PR adds a detailed and standardized model card for ConvBERT to improve Hugging Face Transformers documentation.

Includes:

  • Model Overview and Architecture
  • Training objective and dataset details
  • Use cases and limitations
  • Code usage examples via pipeline, AutoModel, and CLI
  • Quantization and AttentionMaskVisualizer support
  • Benchmarks and citation

File added:

  • src/transformers/models/convbert/modelcard.md

This contribution helps improve model discoverability and provides users with accessible and actionable information about ConvBERT.

cc: @stevhliu (documentation reviewer)


Fixes: N/A

@Aesha19
Copy link
Author

Aesha19 commented May 29, 2025

Hi! This is my first contribution — just checking in kindly for a review
cc: @stevhliu

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a nice start but I think you should revisit the template in the issue to make sure your model card follows the same format and language! As an example, take a look at the BERT docs :)

@@ -0,0 +1,125 @@
<!-- ConvBERT model card -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to remove this

Suggested change
<!-- ConvBERT model card -->
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

Comment on lines +5 to +9
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the TensorFlow badge and this should go above # ConvBERT

Comment on lines +11 to +13
---

## Model Overview
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
---
## Model Overview


## Model Overview

ConvBERT is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ConvBERT is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost.
[ConvBERT](https://huggingface.co/papers/2008.02496) incorporates a mixed attention block that makes it more efficient than [BERT](./bert). Attention is costly because it models global word relationships. This is inefficient because some heads only learn local word relationships. ConvBERT replaces some of the attention heads with a convolution head to handle this. The result of this new mixed attention design is a more lightweight model with lower training costs without compromising performance.
Instead of using attention heads everywhere to model , ConvBERT also includes convolution heads to model local word relationships.
is a lightweight and efficient NLP transformer model introduced by YituTech. It improves on the classic BERT architecture by incorporating **span-based dynamic convolutions** into the self-attention mechanism. This hybrid approach enables ConvBERT to model both local and global dependencies more effectively while reducing the computational cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants