Skip to content

Conversation

@eshitachandwani
Copy link
Member

@eshitachandwani eshitachandwani commented Nov 17, 2025

This change is part of A74 implementation.

This PR removes the listener and route watchers from resolver and changes it so that we get the resources from xds dependency manager.

RELEASE NOTES: N/A

@eshitachandwani eshitachandwani added Type: Internal Cleanup Refactors, etc Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Nov 17, 2025
@codecov
Copy link

codecov bot commented Nov 17, 2025

Codecov Report

❌ Patch coverage is 79.31034% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.40%. Comparing base (76c67d1) to head (3f87474).

Files with missing lines Patch % Lines
internal/xds/resolver/xds_resolver.go 76.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8711      +/-   ##
==========================================
+ Coverage   83.32%   83.40%   +0.07%     
==========================================
  Files         419      418       -1     
  Lines       32427    32331      -96     
==========================================
- Hits        27021    26966      -55     
+ Misses       4033     4006      -27     
+ Partials     1373     1359      -14     
Files with missing lines Coverage Δ
...ernal/xds/clients/xdsclient/clientimpl_watchers.go 93.54% <100.00%> (+0.44%) ⬆️
internal/xds/resolver/xds_resolver.go 89.71% <76.00%> (+7.28%) ⬆️

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@eshitachandwani eshitachandwani added this to the 1.78 Release milestone Nov 17, 2025
@eshitachandwani eshitachandwani requested review from arjan-bal and easwars and removed request for easwars November 17, 2025 03:27
Comment on lines 72 to 75
// All the fields below are protected by mu.
mu sync.Mutex
stopped bool
// All the fields below are accessed only from the callback serializer.
serializer *grpcsync.CallbackSerializer
serializerCancel context.CancelFunc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the PR description, this is being done because the XDS client is calling back into the watcher while the watcher is being registered. Can we fix by instead having the xDS client use it's serializer to schedule a call?

		c.serializer.TrySchedule(func(context.Context) {
			watcher.ResourceError(fmt.Errorf("authority %q not found in bootstrap config for resource %q", n.Authority, resourceName), func() {})
		})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Changed.

@eshitachandwani eshitachandwani changed the title internal:xds change xds_resolver to use dependency manager internal/xds: change xds_resolver to use dependency manager Nov 17, 2025
@easwars
Copy link
Contributor

easwars commented Nov 17, 2025

Is the PR description out of date? I don't see any changes about switching the dependency manager to use the serializer again.

}
}

// Tests the case where a resource, present in cache, returned by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to continue to test this scenario here? Or is this completely covered by tests in the dependency manager? I guess if there is an ambient error (with the dependency manager in the picture), that error is swallowed by it and the resolver does not see it.

So, maybe we need a test to ensure that the resolver does not see an update in this case? And that RPCs continue to succeed (if they were succeeding earlier)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test in the dependency manager is also mostly similar. Main difference would be the test in dependency managers uses the New function , and the test here will start with the resolver. Either way , adding it does no harm , so re- added.

Comment on lines +66 to +68
c.serializer.TrySchedule(func(context.Context) {
watcher.ResourceError(fmt.Errorf("ResourceType implementation for resource type url %v is not found", rType.TypeURL), func() {})
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you fixing a bug in the existing code here? If so, did you get a chance to check the PR where this was added to see if there was in fact a reason for calling the watch callback outside of the serializer here?

Also, while you are here, it might be worth updating the docstring of this method and the ResourceWatcher interface (there is no mention of the serialization guarantees that are provided by the xDS client when calling methods on this interface). It says the watch will fail if the resource type implementation does not exist. But what does failing a watch mean? Also, it does not say anything about the case where the authority in the resource name is not found in the configuration (also, it should not say bootstrpa, as this is the generic xDS client, and bootstrap is a gRPC xDS client concept). You could even do it as a separate PR (both the bug fix and the fixes to the docstring).

Comment on lines 221 to 227
// All methods on the xdsResolver type except for the ones invoked by gRPC,
// i.e ResolveNow() and Close(), are guaranteed to execute in the context of
// this serializer's callback. And since the serializer guarantees mutual
// exclusion among these callbacks, we can get by without any mutexes to
// access all of the below defined state. The only exception is Close(),
// which does access some of this shared state, but it does so after
// cancelling the context passed to the serializer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this comment, there is no comment about which fields' access need to be synchronized and which ares are set up at creation time and are read-only after that (and therefore don't need any synchronization).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry , got deleted by mistake. Added again.

// - prunes active clusters and pushes a new service config to the channel.
// - updates the current config selector used by the resolver.
func (r *xdsResolver) Update(config *xdsresource.XDSConfig) {
r.serializer.TrySchedule(func(context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need for this to be run in a blocking fashion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes , because of the shared state of r.curConfigSelecteor

// access all of the below defined state. The only exception is Close(),
// which does access some of this shared state, but it does so after
// cancelling the context passed to the serializer.
serializer *grpcsync.CallbackSerializer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we also discussed the possibility of reverting to using a mutex here as well instead of a serializer? Did you decide against doing it? Or did you do it and things didn't work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right , I did try that , but it did not work mainly becuase of the sendNewServiceConfig function , which can we called from multiple RPCs

@easwars easwars assigned eshitachandwani and unassigned easwars and arjan-bal Nov 17, 2025
@eshitachandwani
Copy link
Member Author

Is the PR description out of date? I don't see any changes about switching the dependency manager to use the serializer again.

Yeah sorry , I was making that change earlier but Arjan suggested a different way, so that got changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Internal Cleanup Refactors, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants