Skip to content

Conversation

@copybara-service
Copy link

@copybara-service copybara-service bot commented Oct 31, 2025

[JAX SC] Parallelize device loop for extraction, sorting, grouping, bucketing and buffer filling.

  • 9.81% geomean reduction (~11% with FDO) in wall time with 0.97% CPU time increase and 5.31% cycles reduction.
  • Use separate pool to avoid deadlocks. The fixed cost for scheduling should be less than 0.1%
  • Add default constructible objects for parallelization.

…ucketing and buffer filling.

* `9.81%` geomean reduction (`~11%` with FDO) in wall time with `0.97%` CPU time increase and `5.31%` cycles reduction.
* Use separate pool to avoid deadlocks. The fixed cost for scheduling should be less than 0.1%
* Add default constructible objects for parallelization.

PiperOrigin-RevId: 826509091
@copybara-service copybara-service bot changed the title [JAX SC] Parallelize device loop for extraction, sorting and grouping. [JAX SC] Parallelize device loop for extraction, sorting, grouping, bucketing and buffer filling. Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants