-
Notifications
You must be signed in to change notification settings - Fork 33.8k
Port ESMC and ESMFold2 to Transformers #46419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Rocketknight1
wants to merge
70
commits into
main
Choose a base branch
from
port-esmc-esmfold2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+6,028
−0
Open
Changes from all commits
Commits
Show all changes
70 commits
Select commit
Hold shift + click to select a range
22f6692
Port ESMC + ESMFold2 model code from fork onto v5 main (baseline, una…
Rocketknight1 49c7a4a
Strip vendored Triton kernels + set_kernel_backend selector from esmf…
Rocketknight1 39f5331
Remove vendored distributed/ (2D context-parallel) stack from esmfold2
Rocketknight1 88701d2
Refactor ESMC attention onto the v5 ALL_ATTENTION_FUNCTIONS interface
Rocketknight1 a42a910
Remove transformer_engine dependency from ESMC (pure-PyTorch only)
Rocketknight1 3c1c65b
Modernize ESMC rotary to the standard (cos, sin) + apply_rotary_pos_e…
Rocketknight1 abfe0a0
Add modular_esmc.py; generate modeling_esmc.py from it
Rocketknight1 a1d91a7
Fix ESMC checkpoint loading + bidirectional attention (verified vs fo…
Rocketknight1 091ec18
Fix ESMCTokenizer docstring example + add tokenizer tests
Rocketknight1 cce0178
Drop vestigial vocab_file entry from ESMCTokenizer VOCAB_FILES_NAMES
Rocketknight1 33afd38
Add ESMC model tests + make _init_weights cover all modules
Rocketknight1 6bd3270
Add ESMC docs + resolve auto_docstring checkpoint for the classificat…
Rocketknight1 52c9519
Remove the SAE (esmc_sae) from ESMC — deferred to a follow-up PR
Rocketknight1 f42978a
Normalize ESMC docstrings to satisfy check_docstrings
Rocketknight1 e082193
Get ESMFold2 importing: add __all__ + load ESMC via the Auto registry
Rocketknight1 4a435e0
Drop the experimental ESMFold2 variant + esmfold2_v2 mapping (release…
Rocketknight1 5e656ab
Remove TransformerEngine fp8 path from ESMFold2
Rocketknight1 7676ea1
Route ESMFold2 plain self-attention through the v5 attention interface
Rocketknight1 0a22192
Convert ESMFold2Config to the v5 @strict / PreTrainedConfig style
Rocketknight1 e4ea090
Add ESMFold2 tests + sub-config model_types
Rocketknight1 9bc2602
Add ESMFold2 model doc page
Rocketknight1 47d0099
Ruff-format the ESMFold2 fork files (deferred style sweep)
Rocketknight1 d442c1e
Convert ESMCConfig to the v5 @strict / @auto_docstring style
Rocketknight1 da35ca4
Fix ESMFold2 integration test: ubiquitin + correct 0-1 pLDDT/pTM scale
Rocketknight1 3e03361
Make ESMFold2 dtype-honest: drop in-model autocast, support from_pret…
Rocketknight1 99b0b49
Apply make fix-repo sweep + bf16 ESMFold2 usage example
Rocketknight1 870be2c
Keep ESMFold2 sub-configs internal (drop their model_types)
Rocketknight1 203e73e
Add kernel, update docs
Rocketknight1 1520d00
Some cleanup
Rocketknight1 60de6ca
More cleanup
Rocketknight1 2fd1060
dtype cleanups to make outputs match the original fork
Rocketknight1 784ece7
More chasing dtypes
Rocketknight1 fbe1db0
More bf16 to match the original + bump speed
Rocketknight1 0bba9d7
Remove more float upcasts for speeds + closer fork match
Rocketknight1 17d93c0
Remove more float upcasts for speeds + closer fork match
Rocketknight1 8f5b306
More trunk norm matching
Rocketknight1 561dfc5
Big general cleanup, no more _common.py, lots of repo standardization
Rocketknight1 bbf8d8d
Remove a lot of redundant dtype casts now that we no longer need them
Rocketknight1 e7d8a17
Rename a lot of config attributes to the standard ones
Rocketknight1 5c436c4
Modular cleanup, use standard output types
Rocketknight1 61d175f
More modular cleanup
Rocketknight1 af1a479
More modular cleanup
Rocketknight1 58610a4
More modular cleanup
Rocketknight1 e975f64
Even more modular cleanup
Rocketknight1 f0c8b77
No more modular ESMFold2
Rocketknight1 7837559
Dead code cleanup
Rocketknight1 682ff80
No more apply_torch_compile
Rocketknight1 1a2c984
Simplify the obvious einsums but leave the very messy ones
Rocketknight1 ad9abd2
Config cleanup, stop passing naked kwargs around
Rocketknight1 4b0e39d
Small docs cleanups
Rocketknight1 206e06e
Get rid of CPU-only path that we don't need
Rocketknight1 2e4e470
Test cleanup
Rocketknight1 7c7d8c8
Doc dates
Rocketknight1 a76c930
Comments cleanup
Rocketknight1 3579775
Re-add the kernel after rebase
Rocketknight1 8a2e3bc
Fix dates
Rocketknight1 6667c24
Push ESMC fixes
Rocketknight1 1ccebba
Fix the tokenizer to match Llama
Rocketknight1 e9afa19
Make the token classifier generic, attention cleanup, big modular red…
Rocketknight1 c884d5e
Import the sequence classifier
Rocketknight1 284ad3d
More review fixes
Rocketknight1 425ba1b
make fix-repo
Rocketknight1 db59333
tokenizer fixup
Rocketknight1 852145b
Remove FA2 for esmfold2, lift some constants into the config
Rocketknight1 7612e47
Big refactor to address reviewer comments
Rocketknight1 96b5089
Bundle args together, drop some dead args
Rocketknight1 3faf471
date fixup for CI
Rocketknight1 9a48a84
date fixup for CI
Rocketknight1 111a3eb
Move more stuff to config, merge more swiglus
Rocketknight1 78df1b0
Cleaning up some constants
Rocketknight1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| <!--Copyright 2026 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
|
|
||
| ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | ||
| rendered properly in your Markdown viewer. | ||
|
|
||
| --> | ||
| *This model was contributed to Hugging Face Transformers on 2026-07-02.* | ||
|
|
||
| # ESMC | ||
|
|
||
| ## Overview | ||
|
|
||
| ESMC (ESM Cambrian) is a family of protein language models released by [BioHub](https://biohub.org/). | ||
| It is a bidirectional Transformer encoder trained with a masked-language-modelling objective over amino-acid sequences. | ||
| Like [ESM-2](./esm), ESMC produces per-residue representations that are useful for downstream protein modelling tasks. | ||
|
|
||
| ESMC is suitable for fine-tuning on protein classification or token classification tasks. It is also used as the | ||
| backbone of [ESMFold2](./esmfold2), where it generates representations that are used as input to the folding head. | ||
|
|
||
| Pre-trained checkpoints are available on the Hugging Face Hub: | ||
|
|
||
| - [`biohub/ESMC-300M`](https://huggingface.co/biohub/ESMC-300M) | ||
| - [`biohub/ESMC-600M`](https://huggingface.co/biohub/ESMC-600M) | ||
| - [`biohub/ESMC-6B`](https://huggingface.co/biohub/ESMC-6B) | ||
|
|
||
| ## Usage example | ||
|
|
||
| ESMC is registered with the auto classes (`AutoModel`, `AutoModelForMaskedLM`, | ||
| `AutoModelForSequenceClassification`, `AutoModelForTokenClassification`). | ||
|
|
||
| <hfoptions id="usage"> | ||
| <hfoption id="Pipeline"> | ||
|
|
||
| ```python | ||
| import torch | ||
| from transformers import pipeline | ||
|
|
||
| extractor = pipeline( | ||
| task="feature-extraction", | ||
| model="biohub/ESMC-300M", | ||
| ) | ||
| # Per-residue representations of shape (batch, sequence_length, hidden_size). | ||
| representations = extractor("MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQ", return_tensors="pt") | ||
| ``` | ||
|
|
||
| </hfoption> | ||
| <hfoption id="AutoModel"> | ||
|
|
||
| ```python | ||
| import torch | ||
| from transformers import AutoModel, AutoTokenizer | ||
|
|
||
| tokenizer = AutoTokenizer.from_pretrained("biohub/ESMC-300M") | ||
| model = AutoModel.from_pretrained("biohub/ESMC-300M") | ||
|
|
||
| inputs = tokenizer("MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQ", return_tensors="pt") | ||
| with torch.no_grad(): | ||
| outputs = model(**inputs) | ||
|
|
||
| # Per-residue representations of shape (batch, sequence_length, hidden_size). | ||
| representations = outputs.last_hidden_state | ||
| ``` | ||
|
|
||
| </hfoption> | ||
| </hfoptions> | ||
|
|
||
| ## ESMCConfig | ||
|
|
||
| [[autodoc]] ESMCConfig | ||
|
|
||
| ## ESMCTokenizer | ||
|
|
||
| [[autodoc]] ESMCTokenizer | ||
|
|
||
| ## ESMCModel | ||
|
|
||
| [[autodoc]] ESMCModel | ||
| - forward | ||
|
|
||
| ## ESMCForMaskedLM | ||
|
|
||
| [[autodoc]] ESMCForMaskedLM | ||
| - forward | ||
|
|
||
| ## ESMCForSequenceClassification | ||
|
|
||
| [[autodoc]] ESMCForSequenceClassification | ||
| - forward | ||
|
|
||
| ## ESMCForTokenClassification | ||
|
|
||
| [[autodoc]] ESMCForTokenClassification | ||
| - forward |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| <!--Copyright 2026 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
|
|
||
| ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | ||
| rendered properly in your Markdown viewer. | ||
|
|
||
| --> | ||
| *This model was contributed to Hugging Face Transformers on 2026-07-02.* | ||
|
|
||
| # ESMFold2 | ||
|
|
||
| ## Overview | ||
|
|
||
| ESMFold2 is an all-atom protein structure prediction model. It predicts 3D coordinates and per-residue confidence | ||
| (pLDDT, PAE, PDE) directly from an amino-acid sequence, using the [ESMC](./esmc) protein language model as its | ||
| backbone. The architecture combines a sliding-window atom encoder with 3D rotary position embeddings, a pairwise | ||
| folding trunk applied iteratively, a diffusion-based structure head, and a confidence head. | ||
|
|
||
| The model checkpoints are available on the Hugging Face Hub at | ||
| [`biohub/ESMFold2`](https://huggingface.co/biohub/ESMFold2) and [`biohub/ESMFold2-Fast`](https://huggingface.co/biohub/ESMFold2-Fast) | ||
|
|
||
| ## Usage example | ||
|
|
||
| ```python | ||
| import torch | ||
|
|
||
| from transformers import ESMFold2Model | ||
|
|
||
| # The ESMC backbone is bundled in the checkpoint and loaded with the model. | ||
| # bf16 is the recommended inference precision. | ||
| model = ESMFold2Model.from_pretrained("biohub/ESMFold2", dtype=torch.bfloat16).cuda().eval() | ||
|
|
||
| pdb_string = model.infer_protein_as_pdb("MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQ") | ||
| with open("prediction.pdb", "w") as f: | ||
| f.write(pdb_string) | ||
| ``` | ||
|
|
||
| `infer_protein` returns the raw outputs (atom coordinates, distogram logits and confidence metrics) as an | ||
| [`~models.esmfold2.modeling_esmfold2.ESMFold2Output`] if you need them instead of a PDB string. You may get | ||
| slightly different predictions if you run the same sequence multiple times. Set a manual seed if you want exactly | ||
| reproducible structures. | ||
|
|
||
| ## Faster inference with a fused kernel | ||
|
|
||
| The folding trunk's dominant cost is the triangle-multiplication update. Passing `use_kernels=True` to | ||
| [`~PreTrainedModel.from_pretrained`] swaps it for a fused Triton kernel loaded from the Hub via the | ||
| [`kernels`](https://github.com/huggingface/kernels) library, leaving the prediction unchanged. It is inference-only and | ||
| CUDA-only; on CPU or without the kernel installed the model transparently falls back to the pure-PyTorch implementation. | ||
| Make sure the model is on a CUDA device when kernelization happens (e.g. with `device_map`). | ||
|
|
||
| ```python | ||
| import torch | ||
|
|
||
| from transformers import ESMFold2Model | ||
|
|
||
| model = ESMFold2Model.from_pretrained( | ||
| "biohub/ESMFold2", dtype=torch.bfloat16, device_map="cuda", use_kernels=True | ||
| ).eval() | ||
|
|
||
| pdb_string = model.infer_protein_as_pdb("MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQ") | ||
| ``` | ||
|
|
||
| ## ESMFold2Config | ||
|
|
||
| [[autodoc]] ESMFold2Config | ||
|
|
||
| ## ESMFold2PreTrainedModel | ||
|
|
||
| [[autodoc]] ESMFold2PreTrainedModel | ||
|
|
||
| ## ESMFold2Model | ||
|
|
||
| [[autodoc]] ESMFold2Model | ||
| - forward | ||
| - infer_protein | ||
| - infer_protein_as_pdb |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| # Copyright 2026 Biohub. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from ...utils import _LazyModule | ||
| from ...utils.import_utils import define_import_structure | ||
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| from .configuration_esmc import * | ||
| from .modeling_esmc import * | ||
| from .tokenization_esmc import * | ||
| else: | ||
| import sys | ||
|
|
||
| _file = globals()["__file__"] | ||
| sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to port to kernels community? Lemme nudge internally if you need help check the kernels channel
#kernelsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was a placeholder location while I was working on the PR! We should definitely move this before merging