Skip to content

Conversation

Pokom
Copy link
Contributor

@Pokom Pokom commented Sep 24, 2025

First step in enabling cost attribution for agents within Synthetic Monitoring. This first step updates the protobuf definition of a check so that the changes can be brought into upstream repositories that will propogate the changes across checks.

This does not update any of the code to handle cost attribution labels. That will come in a future PR.

@Pokom Pokom requested a review from The-9880 September 24, 2025 18:54
@Pokom Pokom requested a review from a team as a code owner September 24, 2025 18:54
@Pokom Pokom requested a review from d0ugal September 24, 2025 18:54
@Pokom Pokom force-pushed the feat/add-cost-attribution-labels-to-checks branch from 8a788e5 to 8307377 Compare September 24, 2025 19:28
First step in enabling cost attribution for agents within Synthetic
Monitoring. This first step updates the protobuf definition of a check
so that the changes can be brought into upstream repositories that will
propogate the changes across checks.

This does not update any of the code to handle cost attribution labels.
That will come in a future PR.

- refs grafana/synthetic-monitoring#370
@Pokom Pokom force-pushed the feat/add-cost-attribution-labels-to-checks branch from 8307377 to 4d36caf Compare September 24, 2025 19:35
string job = 11 [(gogoproto.jsontag) = "job"]; // name of job this check belongs to
bool basicMetricsOnly = 12 [(gogoproto.jsontag) = "basicMetricsOnly"]; // include only basic metrics
string alertSensitivity = 13 [(gogoproto.jsontag) = "alertSensitivity"];
repeated Label costAttributionLabels = 14 [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered the pros and cons of having this associated with the tenant instead? Conceptually I think that would be more accurate. This also leads me to a comment I wanted to make about the process that is triggered when a CAL is set/updated:
Based on some diagrams I've seen, I believe, our initial approach is to, once a CAL is set, update all the checks for that tenant and explicitly set a label for them with value _missing_ in case it's missing, or do nothing if it's already set for the check.
I have some doubts on whether this should work that way. Here's one example where we can find "conflicts":
(let's imagine we have a single CAL)

  • Admin sets CAL to team
  • We update all checks to include the team label, setting it to _missing_ if it does not exist
  • Admin updates CAL to service
  • We update all checks to include the service label, setting it to _missing_ if it does not exist

In this case, when updating CAL, what do we do with the cases of team labels that have value _missing_? Do we plan to update them? Otherwise users will have checks that have new labels set by us which no longer have any function.
This whole "explicit" manipulation of the check labels seems odd to me.
The way I understand CALs and their impact on our checks and check telemetry, is that CALs behave as a "mask".
In the UI we can use the CALs as a mask with the check labels to show users that CAL labels are unset.
In the agent we can use the CALs as a mask with the check labels to know when a check has/has not the CAL labels set, and therefore set their value for telemetry purposes to _missing_ as that's the main use case we are interested so far, the telemetry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I considered having the CAL stored directly on the tenant.
Before diving in to that, let me at least expand upon your questions if we stuck with this approach:

In this case, when updating CAL, what do we do with the cases of team labels that have value missing? Do we plan to update them? Otherwise users will have checks that have new labels set by us which no longer have any function.
That's how we would have handle that scenario, otherwise we end up in a state where data is just not accurate.

Coming back, let me trace through some of the code and share what I see(read, mostly my way of thinking outloud and getting some confirmation). The main reason to have it on the check was to track executions, but I suppose if we're leaning in on using labels directly, we may be able to pivot. One limitation was the data available on a check when telemeter.AddTelemetry is called. The scraper struct only has the probe and check, and the check only has access to the TenantID.

So what I'm not seeing is how to propogate cost attributation label data from the tenant to the checks. There is a channel for tenants that seems to sync some tenant info in checks.handleBatchChanges. Is there a clear path that you see that could allow us to wire the defined cost attribution labels from the tenant to the checks? If so, I can see storing the data at the tenant level being more palatable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One limitation was the data available on a check when telemeter.AddTelemetry is called. The scraper struct only has the probe and check, and the check only has access to the TenantID.

I think we can just add it. For example see the implementation of TenantLimits, which is passed to the scraper here. We could follow a similar pattern for some "CALProvider" that uses the shared tenant's cache in the agent to retrieve the CALs.

@Pokom
Copy link
Contributor Author

Pokom commented Oct 7, 2025

I'm closing this out in favor of #1544, which adds the cost attribution label to the tenant. Thanks @ka3de for the nudge on how to propagate the cost attribution labels over to the checks. The main benefit to having this on the tenant vs checks is that when we need to propagate changes from CMAB => Agent, we only need to make one database call to update the tenant. The original design, we would need to update all of the checks for a tenant on each change.

@Pokom Pokom closed this Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants