Add native masked MSE loss for Sapiens2ForPoseEstimation by Sainava · Pull Request #46764 · huggingface/transformers

Sainava · 2026-06-19T08:40:29Z

What does this PR do?

As discussed with the maintainers in the linked issue, this PR implements native supervised pose-estimation loss directly in Sapiens2ForPoseEstimation to unlock fine-tuning capabilities using the Trainer API.

Specific Changes:

Added an optional target_weights parameter to the forward signature to handle keypoint visibility masking, following the masking behavior of OpenMMLab's KeypointMSELoss without adding external dependencies.
Implemented the masked MSE loss calculation using pure PyTorch, explicitly casting masks to the heatmap dtype to ensure safe fp16/bf16 mixed-precision training.
Added strict shape and dimensionality validation for supervision targets to prevent silent broadcasting errors.
Updated the forward docstring to explicitly document the new parameter.
Updated test_modeling_sapiens2.py to verify the loss computation both with and without target_weights.

All local make fix-repo and make check-repo checks have passed.

Scope of this PR

This implementation follows the masking behavior of OpenMMLab's KeypointMSELoss through optional target_weights support. It intentionally does not implement skip_empty_channel or the configurable loss_weight parameter from OpenMMLab, as neither is currently exposed through the Sapiens2 Transformers configuration. The goal of this PR is to provide native supervised pose-estimation training support while keeping the initial implementation focused and aligned with existing Transformers conventions.

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline and the
Pull Request checks?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Investigate training support for Sapiens2ForPoseEstimation when labels are provided #46518 (comment)
Did you make sure to update the documentation with your changes according to the guidelines?
Did you write any new necessary tests?

Who can review?

Hi @guarin, following up on the discussion in #46518, I've put together a draft implementation for supervised pose-estimation loss support in Sapiens2ForPoseEstimation. I'd appreciate any feedback when you have a chance.

…#46518)

guarin

Thank you for the PR! Enabling pose estimation fine-tuning will be great addition!

Left some comments :) Ideally you could also add a small example in sapiens2.md on how to pass the labels and get the loss.

Sainava · 2026-06-19T17:54:49Z

Hi @guarin, thanks for the feedback!

I've updated the PR to follow the suggested loss-function architecture, moved the pose-estimation test inputs into a dedicated factory method, and added a supervised fine-tuning example to the documentation.

One implementation detail: _loss_function is wrapped with staticmethod(...) so it works correctly with the self.loss_function property and avoids Python binding the model instance as the first argument.

I'd really appreciate any further feedback when you have a chance :)

guarin

Thanks for the update and adding the docs! This is already looking pretty good, left minor comments on how we could simplify further :)

Let me know if you would be interested to add the preprocessing as well (might be a bit tricky).

guarin · 2026-06-22T06:07:41Z

+            loss = self.loss_function(heatmaps, labels, reduction="none")
+
+            if label_weights is not None:
+                loss = (loss * label_weights).mean()
+            else:
+                loss = loss.mean()


Could we pass weights in all cases?

Suggested change

loss = self.loss_function(heatmaps, labels, reduction="none")

if label_weights is not None:

loss = (loss * label_weights).mean()

else:

loss = loss.mean()

loss = self.loss_function(heatmaps, labels, weight=weights)

This should work correctly if weights is a Tensor or None and will make it easier for users to customise the loss function. Otherwise they have to overwrite the full forward method to handle weights correctly.

guarin · 2026-06-22T06:10:38Z

+<hfoption id="Supervised fine-tuning (Pose estimation)">
+


Maybe rename to this to be closer in form to the Pose estimation and Pose estimation with flip augmentations sections

Suggested change

<hfoption id="Supervised fine-tuning (Pose estimation)">

<hfoption id="Pose estimation training">

guarin · 2026-06-22T06:21:49Z

+batch_size, num_keypoints = 1, 308
+heatmap_height, heatmap_width = 256, 192


You can assume here that the heatmaps have the same height and width as the preprocessed image. We'll have to add the pose preprocessing to the Sapiens2ImageProcessor which will convert it to the correct format and size. The original Sapiens2 code for this is here: https://github.com/facebookresearch/sapiens2/blob/main/sapiens/pose/src/datasets/transforms/pose_transforms.py

For the loss calculation you might have to interpolate the model outputs to match the label size again.

guarin · 2026-06-22T06:24:12Z

+
+        labels = torch.randn(


Suggested change

labels = torch.randn(

labels = floats_tensor(

Sainava · 2026-06-22T12:37:12Z

Hi @guarin, thanks for the review! I've pushed the suggested updates (tests and documentation)

Regarding the loss function: I tried passing weight=label_weights directly to self.loss_function, but since the default implementation resolves to torch.nn.functional.mse_loss, this raises:

ValueError: Weights and input must have the same size

when using OpenMMLab-style visibility weights of shape [batch_size, num_keypoints, 1, 1] against heatmaps of shape [batch_size, num_keypoints, height, width].

To preserve the original masking behavior, I'm currently computing the unreduced loss and applying the broadcasted weights explicitly.

Would you prefer that I instead move the weighting logic into a small custom _loss_function implementation so the forward pass can remain:

loss = self.loss_function(heatmaps, labels, weight=label_weights)

while preserving the same behavior?

And yes, I'd definitely be interested in working on the preprocessing side as well. My preference would be to get the training-loss support merged first and then tackle the preprocessing changes in a follow-up PR if that's okay .

guarin · 2026-06-22T13:02:14Z

Hi! I think we can safely assume that label_weights has the same shape as labels in the forward function. The pre-processing can make sure that this is the case and expand any (batch_size, num_keypoints, 1, 1) tensors to match labels.shape so we don't have to worry about that in forward.

github-actions · 2026-06-22T15:16:32Z

CI Dashboard: View test results in Grafana

Sainava · 2026-06-22T15:18:40Z

Hi @guarin ! I've simplified the forward pass to use weight=label_weights directly in the loss function as suggested.
Thanks again for the guidance :)

vasqu

Overall pretty much ready just some smaller comments from my side 🤗

vasqu · 2026-06-23T15:39:42Z

    """,
 )
 class Sapiens2ForPoseEstimation(Sapiens2PreTrainedModel):
+    _loss_function = staticmethod(torch.nn.functional.mse_loss)


please lets add this to

transformers/src/transformers/loss/loss_utils.py

Line 172 in 49f6b78

LOSS_MAPPING = {

instead

so something along "Sapiens2ForPoseEstimation": torch.nn.functional.mse_loss,

vasqu · 2026-06-23T15:42:12Z

+        with torch.no_grad():
+            result_with_loss = model(
+                pixel_values,
+                labels=labels,
+            )
+
+        self.parent.assertIsNotNone(result_with_loss.loss)
+
+        with torch.no_grad():
+            result_with_weights = model(
+                pixel_values,
+                labels=labels,
+                label_weights=label_weights,
+            )
+
+        self.parent.assertIsNotNone(result_with_weights.loss)
+


Imo we should maybe also check that the backward doesnt result in runtime error so would avoid the no grads here and call a backward

github-actions · 2026-06-23T20:41:38Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: sapiens2

Sainava · 2026-06-23T21:05:09Z

Hi @vasqu, thanks for the review :) .I've moved the loss to LOSS_MAPPING and added the .backward() tests as well .

Sainava and others added 3 commits June 19, 2026 13:50

Add native masked MSE loss for Sapiens2ForPoseEstimation (huggingface…

181ff3e

…#46518)

Fix trailing whitespace in test file

49b0033

Merge branch 'main' into feat/46518-sapiens2-pose-loss

d83872e

guarin reviewed Jun 19, 2026

View reviewed changes

Comment thread src/transformers/models/sapiens2/modeling_sapiens2.py Outdated

Comment thread src/transformers/models/sapiens2/modeling_sapiens2.py Outdated

Comment thread tests/models/sapiens2/test_modeling_sapiens2.py Outdated

Sainava added 2 commits June 19, 2026 22:28

Add supervised fine-tuning documentation for Sapiens2 pose estimation

8d8a847

Fix method binding bug on _loss_function by wrapping in staticmethod

fcfa574

Sainava added 2 commits June 19, 2026 23:27

Merge branch 'main' into feat/46518-sapiens2-pose-loss

f996ad9

Merge branch 'main' into feat/46518-sapiens2-pose-loss

4a99e4c

guarin reviewed Jun 22, 2026

View reviewed changes

Switch test to floats_tensor and fix docs heatmap dimensions

cdf9695

Merge branch 'main' into feat/46518-sapiens2-pose-loss

35ce0bc

Sainava added 2 commits June 22, 2026 20:21

Simplify loss function by relying on upstream weight expansion

5056650

Update dummy label_weights shape in tests to match heatmaps

e94d79f

vasqu approved these changes Jun 23, 2026

View reviewed changes

Centralize loss function and test backward pass

840b2c3

	<hfoption id="Supervised fine-tuning (Pose estimation)">
	<hfoption id="Pose estimation training">

		batch_size, num_keypoints = 1, 308
		heatmap_height, heatmap_width = 256, 192

Conversation

Sainava commented Jun 19, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

guarin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sainava commented Jun 19, 2026

Uh oh!

guarin left a comment

Choose a reason for hiding this comment

Uh oh!

guarin Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

guarin Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

guarin Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guarin Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Sainava commented Jun 22, 2026

Uh oh!

guarin commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

Sainava commented Jun 22, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

Sainava commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

guarin Jun 22, 2026 •

edited

Loading