Skip to content

Conversation

@EdwinaZhu
Copy link
Collaborator

@EdwinaZhu EdwinaZhu commented Dec 15, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR adds job-level CPU usage observability for streaming jobs by aggregating existing actor-level Tokio poll duration metrics.

Previously, CPU-related metrics were only available at the fragment level #23747, making it hard to understand resource usage from a job perspective. This PR uses PromQL to aggregate poll duration from fragment → job level, and updates Grafana dashboards under Streaming section.

The goal is to provide a lightweight and practical view of CPU usage at the job level, helping users quickly identify CPU-heavy streaming jobs and diagnose performance issues.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

@EdwinaZhu EdwinaZhu changed the title feat(observability) add job level streaming cpu profilling feat(observability): add job level streaming cpu profilling Dec 15, 2025
@EdwinaZhu EdwinaZhu changed the title feat(observability): add job level streaming cpu profilling feat(stream, observability): aggregate fragment-level tokio poll duration as job-level cpu usage Dec 15, 2025
@EdwinaZhu EdwinaZhu force-pushed the job-level-streaming-cpu-profilling branch from c2b1f83 to 04b54aa Compare December 15, 2025 07:51
@EdwinaZhu EdwinaZhu changed the title feat(stream, observability): aggregate fragment-level tokio poll duration as job-level cpu usage feat(observability): add job level streaming cpu profilling Dec 15, 2025
@EdwinaZhu EdwinaZhu marked this pull request as ready for review December 15, 2025 07:52
@EdwinaZhu EdwinaZhu requested a review from kwannoel December 15, 2025 07:52
Copy link
Contributor

@kwannoel kwannoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share some screenshots of how this panel looks?

@EdwinaZhu
Copy link
Collaborator Author

#23747

image image

@kwannoel this is the screenshot of the panel.

@EdwinaZhu EdwinaZhu force-pushed the job-level-streaming-cpu-profilling branch from 04b54aa to 6f3afdd Compare December 15, 2025 10:50
@EdwinaZhu EdwinaZhu requested a review from a team as a code owner December 15, 2025 10:50
@EdwinaZhu EdwinaZhu requested review from BugenZhao and removed request for a team December 15, 2025 10:50
@EdwinaZhu EdwinaZhu marked this pull request as draft December 15, 2025 10:51
@EdwinaZhu EdwinaZhu force-pushed the job-level-streaming-cpu-profilling branch from 6f3afdd to aa6a206 Compare December 15, 2025 10:52
@EdwinaZhu EdwinaZhu marked this pull request as ready for review December 15, 2025 10:52
@EdwinaZhu EdwinaZhu requested a review from kwannoel December 15, 2025 10:53
@EdwinaZhu EdwinaZhu closed this Dec 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants