Skip to content

Conversation

shivaspeaks
Copy link
Member

@shivaspeaks shivaspeaks commented Jul 7, 2025

Implements gRFC A85 (grpc/proposal#454).

@shivaspeaks shivaspeaks changed the title ORCA to LRS propagation changes xds: ORCA to LRS propagation changes Jul 8, 2025
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to look at some details more closely, but it is mostly just plumbing.

@@ -25,6 +25,7 @@
import com.google.common.collect.Sets;
import io.grpc.Internal;
import io.grpc.Status;
import io.grpc.xds.BackendMetricPropagation;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.grpc.xds.client can't depend on io.grpc.xds. We moved client into its own package so it could be used without the rest of grpc.

@@ -420,6 +421,29 @@ public void run() {
return loadCounter;
}

@Override
public LoadStatsManager2.ClusterLocalityStats addClusterLocalityStats(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the old method just call this one with backendMetricPropagation set to null? (Feel free to do that in XdsClient.java)

Comment on lines 806 to 809
Map<String, Struct> filterMetadata, @Nullable BackendMetricPropagation backendMetricPropagation,
@Nullable OutlierDetection outlierDetection, Object endpointLbConfig,
LoadBalancerRegistry lbRegistry, Map<String,
Map<Locality, Integer>> prioritizedLocalityWeights, List<DropOverload> dropOverloads) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we add backendMetricPropagation param to the end of the methods for better consistency ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will push it after outlierDetection since all the arguments taken from ClusterState are together and then others.

Comment on lines 429 to 443
if (memUtilization > 0) {
boolean shouldPropagate = true;
if (backendMetricPropagation != null) {
shouldPropagate = backendMetricPropagation.propagateMemUtilization;
}

if (shouldPropagate) {
String metricName = "mem_utilization";
if (!loadMetricStatsMap.containsKey(metricName)) {
loadMetricStatsMap.put(metricName, new BackendLoadMetricStats(1, memUtilization));
} else {
loadMetricStatsMap.get(metricName).addMetricValueAndIncrementRequestsFinished(memUtilization);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be extracted out to a separate function ?

@shivaspeaks shivaspeaks requested a review from ejona86 August 4, 2025 16:54
Copy link
Contributor

@kannanjgithub kannanjgithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

*/
public synchronized void recordBackendLoadMetricStats(Map<String, Double> namedMetrics) {
// If no propagation configuration is set, use the old behavior (propagate everything)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done only when the feature is not enabled. If the feature is enabled, only when the * is specified for named_metrics we should propate everything.
Prefixing "named_metrics" should also happen only when the feature is enabled.
Also in recordTopLevelMetrics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be handled by the current code. The methods in BackendMetricPropagation are implemented in such a way to take care of these cases.
However I can now see the case where feature is enabled but no backendMetricPropagation config is available then it creates problem. Best is to check if the feature is enabled or not instead of checking null on backendMetricPropagation. I'll refactor and make it more clear paths for normal path and feature-enabled path.

I think recordTopLevelMetrics works fine I believe.

Copy link
Contributor

@kannanjgithub kannanjgithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs work on unit tests.

*/
@Override
public void onLoadReport(MetricReport report) {
stats.recordTopLevelMetrics(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems better to move the feature guard check from inside stats.recordTopLevelMetrics to here, for more clarity.

@@ -398,7 +398,7 @@ public void dynamicCluster() {
ClusterResolverConfig childLbConfig = (ClusterResolverConfig) childBalancer.config;
assertThat(childLbConfig.discoveryMechanism).isEqualTo(
DiscoveryMechanism.forEds(
clusterName, EDS_SERVICE_NAME, null, null, null, Collections.emptyMap(), null));
clusterName, EDS_SERVICE_NAME, null, null, null, Collections.emptyMap(), null, null));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should set a backendMetricPropagation in ClusterResource and assert that it is present in the discovery mechanism in the child LB config.

@@ -140,16 +140,16 @@ public class ClusterResolverLoadBalancerTest {
FailurePercentageEjection.create(100, 100, 100, 100));
private final DiscoveryMechanism edsDiscoveryMechanism1 =
DiscoveryMechanism.forEds(CLUSTER1, EDS_SERVICE_NAME1, LRS_SERVER_INFO, 100L, tlsContext,
Collections.emptyMap(), null);
Collections.emptyMap(), null, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pass a backendMetricsPropagation to acceptResolvedAddresses and assert that it ends up being in the DiscoveryMechanism for both Eds and Logical Dns cluster types.

@@ -1241,8 +1242,9 @@ public ClusterDropStats addClusterDropStats(
@Override
public ClusterLocalityStats addClusterLocalityStats(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should assert for top level metrics updated when feature is enabled (and vice versa) in @test recordLoadStats .

@@ -98,13 +101,20 @@ private synchronized void releaseClusterDropCounter(

/**
* Gets or creates the stats object for recording loads for the specified locality (in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit tests to cover the new changes in LoadStatsManager2Test.

syncContext.execute(new Runnable() {
@Override
public void run() {
serverLrsClientMap.get(serverInfo).startLoadReporting();
}
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Remove empty line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants