Detection references are needlessly transforming masks

Something we realized today with @pmeier: even for pure detection tasks where masks aren't needed, the detection training references are still using the masks from COCO, which means that:

- those masks are being decoded into images 
- those masks get transformed all the time e.g. [here](https://github.com/pytorch/vision/blob/781f512b01bc2324d7fdd11f0901f60571fc476f/references/detection/transforms.py#L39C21-L40)

Both these things are completely wasteful since masks aren't needed for detection tasks. According to some simple benchmark this significantly hurts performance.

(Not sure if that applies to Keypoints too, would need to check)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Detection references are needlessly transforming masks #7489

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Detection references are needlessly transforming masks #7489

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions