-
Notifications
You must be signed in to change notification settings - Fork 4.6k
internal/xds: change xds_resolver to use dependency manager #8711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8711 +/- ##
==========================================
+ Coverage 83.32% 83.40% +0.07%
==========================================
Files 419 418 -1
Lines 32427 32331 -96
==========================================
- Hits 27021 26966 -55
+ Misses 4033 4006 -27
+ Partials 1373 1359 -14
🚀 New features to boost your workflow:
|
| // All the fields below are protected by mu. | ||
| mu sync.Mutex | ||
| stopped bool | ||
| // All the fields below are accessed only from the callback serializer. | ||
| serializer *grpcsync.CallbackSerializer | ||
| serializerCancel context.CancelFunc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the PR description, this is being done because the XDS client is calling back into the watcher while the watcher is being registered. Can we fix by instead having the xDS client use it's serializer to schedule a call?
c.serializer.TrySchedule(func(context.Context) {
watcher.ResourceError(fmt.Errorf("authority %q not found in bootstrap config for resource %q", n.Authority, resourceName), func() {})
})There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! Changed.
|
Is the PR description out of date? I don't see any changes about switching the dependency manager to use the serializer again. |
| } | ||
| } | ||
|
|
||
| // Tests the case where a resource, present in cache, returned by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to continue to test this scenario here? Or is this completely covered by tests in the dependency manager? I guess if there is an ambient error (with the dependency manager in the picture), that error is swallowed by it and the resolver does not see it.
So, maybe we need a test to ensure that the resolver does not see an update in this case? And that RPCs continue to succeed (if they were succeeding earlier)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test in the dependency manager is also mostly similar. Main difference would be the test in dependency managers uses the New function , and the test here will start with the resolver. Either way , adding it does no harm , so re- added.
| c.serializer.TrySchedule(func(context.Context) { | ||
| watcher.ResourceError(fmt.Errorf("ResourceType implementation for resource type url %v is not found", rType.TypeURL), func() {}) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you fixing a bug in the existing code here? If so, did you get a chance to check the PR where this was added to see if there was in fact a reason for calling the watch callback outside of the serializer here?
Also, while you are here, it might be worth updating the docstring of this method and the ResourceWatcher interface (there is no mention of the serialization guarantees that are provided by the xDS client when calling methods on this interface). It says the watch will fail if the resource type implementation does not exist. But what does failing a watch mean? Also, it does not say anything about the case where the authority in the resource name is not found in the configuration (also, it should not say bootstrpa, as this is the generic xDS client, and bootstrap is a gRPC xDS client concept). You could even do it as a separate PR (both the bug fix and the fixes to the docstring).
| // All methods on the xdsResolver type except for the ones invoked by gRPC, | ||
| // i.e ResolveNow() and Close(), are guaranteed to execute in the context of | ||
| // this serializer's callback. And since the serializer guarantees mutual | ||
| // exclusion among these callbacks, we can get by without any mutexes to | ||
| // access all of the below defined state. The only exception is Close(), | ||
| // which does access some of this shared state, but it does so after | ||
| // cancelling the context passed to the serializer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this comment, there is no comment about which fields' access need to be synchronized and which ares are set up at creation time and are read-only after that (and therefore don't need any synchronization).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry , got deleted by mistake. Added again.
| // - prunes active clusters and pushes a new service config to the channel. | ||
| // - updates the current config selector used by the resolver. | ||
| func (r *xdsResolver) Update(config *xdsresource.XDSConfig) { | ||
| r.serializer.TrySchedule(func(context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any need for this to be run in a blocking fashion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes , because of the shared state of r.curConfigSelecteor
| // access all of the below defined state. The only exception is Close(), | ||
| // which does access some of this shared state, but it does so after | ||
| // cancelling the context passed to the serializer. | ||
| serializer *grpcsync.CallbackSerializer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we also discussed the possibility of reverting to using a mutex here as well instead of a serializer? Did you decide against doing it? Or did you do it and things didn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right , I did try that , but it did not work mainly becuase of the sendNewServiceConfig function , which can we called from multiple RPCs
Yeah sorry , I was making that change earlier but Arjan suggested a different way, so that got changed. |
This change is part of A74 implementation.
This PR removes the listener and route watchers from resolver and changes it so that we get the resources from xds dependency manager.
RELEASE NOTES: N/A