Skip to content

xds: generic xDS client transport channel and ads stream implementation #8144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented Mar 5, 2025

This is the xDS transport channel implementation change that setup an ADS (Aggregated Data Service) stream with xDS management server which can be used by generic xDS client to do resource watching. The ADS stream and xDS channel uses the types and interfaces that were created in #8042. The e2e tests are written using types and interfaces in #8042 and grpctransport created in #8066.

The PR copies the existing ADS stream and channel implementation from internal xdsclient code along with the helpers that are part of gRPC and then modify them to use the generic client types and interfaces. The PR contains 6 commits. Odd commits (1, 3, 5) are doing the exact copy and even commits (2, 4, 6) are modifying their predecessor commit. The copy index is as follows

Copied Implementation (modified to use generic client configs and interfaces)

From To
xds/internal/xdsclient/transport/ads xds/internal/clients/xdsclient/internal/ads
xds/internal/xdsclient/channel.go xds/internal/clients/xdsclient/xdsclient/channel.go
xds/internal/xdsclient/channel_test.go xds/internal/clients/xdsclient/xdsclient/channel_test.go

Copied helpers common (mainly for testing and logging. only what is needed)

From To
internal/backoff xds/internal/clients/internal/backoff
internal/buffer xds/internal/clients/internal/buffer
internal/pretty xds/internal/clients/internal/pretty
internal/testutils/xds/e2e xds/internal/clients/internal/testutils/e2e
internal/testutils/xds/fakeserver xds/internal/clients/internal/testutils/fakeserver
internal/testutils/channel.go xds/internal/clients/internal/testutils/channel.go

Copied helpers xdsclient (mainly for testing and logging. only what is needed)

From To
xds/internal/xdsclient/xdsresource/errors.go xds/internal/clients/xdsclient/internal/xdsresource/errors.go
xds/internal/xdsclient/xdsresource/name.go xds/internal/clients/xdsclient/internal/xdsresource/name.go
xds/internal/xdsclient/xdsresource/type.go xds/internal/clients/xdsclient/internal/xdsresource/type.go
xds/internal/xdsclient/xdsresource/version/version.go xds/internal/clients/xdsclient/internal/xdsresource/version.go

RELEASE NOTES: None

Copy link

codecov bot commented Mar 5, 2025

Codecov Report

Attention: Patch coverage is 69.38588% with 334 lines in your changes missing coverage. Please review.

Project coverage is 82.07%. Comparing base (af07815) to head (78fe33f).
Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
xds/internal/clients/xdsclient/ads_stream.go 74.23% 84 Missing and 26 partials ⚠️
...nal/clients/xdsclient/internal/xdsresource/name.go 15.51% 49 Missing ⚠️
...al/clients/internal/testutils/fakeserver/server.go 60.39% 37 Missing and 3 partials ⚠️
xds/internal/clients/xdsclient/channel.go 77.12% 26 Missing and 9 partials ⚠️
xds/internal/clients/internal/pretty/pretty.go 0.00% 29 Missing ⚠️
.../internal/clients/internal/testutils/e2e/server.go 77.77% 15 Missing and 5 partials ⚠️
xds/internal/clients/internal/backoff/backoff.go 47.22% 18 Missing and 1 partial ⚠️
...clients/internal/testutils/restartable_listener.go 64.10% 13 Missing and 1 partial ⚠️
...internal/clients/internal/testutils/e2e/logging.go 25.00% 6 Missing ⚠️
...l/clients/xdsclient/internal/xdsresource/errors.go 58.33% 4 Missing and 1 partial ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8144      +/-   ##
==========================================
- Coverage   82.30%   82.07%   -0.24%     
==========================================
  Files         392      410      +18     
  Lines       39132    40251    +1119     
==========================================
+ Hits        32207    33034     +827     
- Misses       5603     5855     +252     
- Partials     1322     1362      +40     
Files with missing lines Coverage Δ
xds/internal/clients/internal/buffer/unbounded.go 100.00% <100.00%> (ø)
...l/clients/internal/syncutil/callback_serializer.go 100.00% <100.00%> (ø)
xds/internal/clients/internal/syncutil/event.go 100.00% <100.00%> (ø)
xds/internal/clients/internal/testutils/channel.go 100.00% <100.00%> (ø)
.../clients/internal/testutils/e2e/clientresources.go 93.93% <93.93%> (ø)
...nal/clients/xdsclient/internal/xdsresource/type.go 50.00% <50.00%> (ø)
...internal/clients/internal/testutils/marshal_any.go 62.50% <62.50%> (ø)
...l/clients/xdsclient/internal/xdsresource/errors.go 58.33% <58.33%> (ø)
...internal/clients/internal/testutils/e2e/logging.go 25.00% <25.00%> (ø)
...clients/internal/testutils/restartable_listener.go 64.10% <64.10%> (ø)
... and 7 more

... and 33 files with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@purnesh42H purnesh42H force-pushed the generic-xds-client-transport-channel-ads-stream branch 2 times, most recently from cfedc55 to 957b0ae Compare March 5, 2025 15:50
@purnesh42H purnesh42H force-pushed the generic-xds-client-transport-channel-ads-stream branch from 957b0ae to 123a5a5 Compare March 6, 2025 04:13
Copy link

linux-foundation-easycla bot commented Mar 6, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@purnesh42H purnesh42H force-pushed the generic-xds-client-transport-channel-ads-stream branch from a349898 to bf86ee7 Compare March 6, 2025 05:28
@purnesh42H purnesh42H force-pushed the generic-xds-client-transport-channel-ads-stream branch from bf86ee7 to 03196ed Compare March 6, 2025 05:37
@purnesh42H purnesh42H added Type: Feature New features or improvements in behavior Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Mar 6, 2025
@purnesh42H purnesh42H added this to the 1.72 Release milestone Mar 6, 2025
@purnesh42H purnesh42H requested review from dfawley and easwars March 6, 2025 08:53
@purnesh42H purnesh42H force-pushed the generic-xds-client-transport-channel-ads-stream branch from 4d7d9ae to fba2d82 Compare March 6, 2025 09:16
@@ -16,17 +16,17 @@
*
*/

// Package grpclog defines logging for grpc.
// Package clientslog defines logging for clients.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should see if there's something better to do for logging. What about slog in the stdlib?

Our only real constraint is that whatever we do needs to work with the existing grpclog stuff when we use these from grpc.

@easwars

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a blocker for PR or it can be tried after e2e working generic xDS client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Estimate work needed to get a standard library like slog working

//
// This is kept in internal until the gRPC project decides whether or not to
// This is kept in internal until the clients project decides whether or not to
// allow alternative backoff strategies.
package backoff
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global concern: are we sure we're not pulling in dead code as part of this? AIUI the existing dead code checker only considers unexported functions/symbols.

Maybe we can scale some of these things back to be simpler & smaller given the more limited usages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed 2 commits of removing unused code from common helpers and xdsclient helpers respectively. That has reduced code by around 3000 lines. I think the helpers can still be simplified but I think that will require some refactoring/rewriting. Can we do that as separate PR once we have a working client?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If DefaultExponential is the only thing that is being used by the code, that should be the only thing exported. Everything else can be unexported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Unexported other structs.

@@ -1,6 +1,6 @@
/*
*
* Copyright 2019 gRPC authors.
* Copyright 2025 gRPC authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's appropriate to change the copyright date on all these files. Let's just leave them alone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, these are new files because they are copied to new folder. So, won't they need new date?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should see if we really need this and kill it otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its mainly for debugging. If we remove this, we can print the response or resource directly using %v

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@@ -21,7 +21,7 @@
package httpfilter

import (
iresolver "google.golang.org/grpc/internal/resolver"
iresolver "google.golang.org/grpc/xds/internal/clients/xdsclient/internal/testutils/resolver"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need name resolver stuff in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed as part of unused code

Comment on lines 33 to 34
ResourceTypeMapForTesting = make(map[string]any)
ResourceTypeMapForTesting[xdsresource.V3ListenerURL] = listenerType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified:

ResourceTypeMapForTesting = map[string]any{xdsresource.V3ListenerURL: listenerType}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this makes sense in the context of an xdsclient, and it should be removed. This is all grpc-specific definitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed as part of unused code

@@ -16,12 +16,12 @@
*
*/

package grpcsync
package clientssync
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe syncutil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Note that this differs from FailedPrecondition. It indicates arguments
// that are problematic regardless of the state of the system
// (e.g., a malformed file name).
invalidArgumentErrCode = 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leverage existing symbols instead?

https://pkg.go.dev/google.golang.org/genproto/googleapis/rpc/code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*/

// Package xdsclient contains implementation of the xDS client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an xdsclient subdirectory inside xdsclient? And it's not internal?

A goal of this change is to add no new exported symbols at all, or to keep that set very small and well understood. That includes new packages not under internal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i had to move the implementation to a separate package from config structs to avoid cyclic dependency from internal packages which needs xdsclient config types. With this structure, to create a new client, user will have to import .....clients\xdsclient\xdsclient

@@ -62,7 +62,7 @@ func newLoggerV2() LoggerV2 {
warningW := io.Discard
infoW := io.Discard

logLevel := os.Getenv("GRPC_GO_LOG_SEVERITY_LEVEL")
logLevel := os.Getenv("CLIENTS_GO_LOG_SEVERITY_LEVEL")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, what would be the migration strategy here for customers/users internally and externally?

All users would have to update to this new environment variable in order to preserve the user-specified log levels. Would this be fine and/or feasible? Similarly, for all other environment variable changes.

Would there be an overall migration/import plan and strategy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a completely new folder in the grpc-go. Eventually this entire code is planned to move to a new repo. There is no change being done to existing code of internal xds client implementation so unless the user decides to switch to this new implementation, they will not notice a change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So both versions would co-exist for some time? If so, how long until we move the default to the new implementation?

Perhaps an overall migration and/or user adoption strategy should be documented in the design doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there are 2 things to note. If you are using internal xds client and don't have a requirement of not having grpc and proto dependencies, you can continue to use the internal xds client and you should not notice any behaviour change. However, if you want to configure your own transport and can't use proto based API(s), it would be good to switch to generic xDS client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So both versions would co-exist for some time? If so, how long until we move the default to the new implementation?

it has to be fast follow up. Not more than 2-3 weeks of generic xDS client completion.

Comment on lines 62 to 71
if sc.IdentityInstanceName != "" {
if _, ok := bc.CertProviderConfigs()[sc.IdentityInstanceName]; !ok {
return fmt.Errorf("identity certificate provider instance name %q missing in bootstrap configuration", sc.IdentityInstanceName)
}
}
if sc.RootInstanceName != "" {
if _, ok := bc.CertProviderConfigs()[sc.RootInstanceName]; !ok {
return fmt.Errorf("root certificate provider instance name %q missing in bootstrap configuration", sc.RootInstanceName)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, why were these security checks deleted? Would it be users could now add their own generic security configs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files are just copied from internal xdsclient implementation to testutils of external xds client for testing. We don't have to do this kind of testing for external xdsclient so after copy, I have removed it. The only testing required for external client is resource watching scenarios.

@@ -126,43 +128,29 @@ func newXDSChannel(opts xdsChannelOpts) (*xdsChannel, error) {
type xdsChannel struct {
// The following fields are initialized at creation time and are read-only
// after that, and hence need not be guarded by a mutex.
transport transport.Transport // Takes ownership of this transport (used to make streaming calls).
ads *ads.StreamImpl // An ADS stream to the management server.
lrs *lrs.StreamImpl // An LRS stream to the management server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm for my understanding, this is because we are moving the LRS client functionality away from the xDS client into its own separate LRS client? (related to #8042).

Currently, in the immediate-term, we have no need for LRS, but we would need it in the future. So this is still a very nice to have.

Copy link
Contributor Author

@purnesh42H purnesh42H Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's correct. For external client, LRS will have a separate client.

@purnesh42H purnesh42H requested a review from dfawley March 12, 2025 14:56
@purnesh42H purnesh42H assigned dfawley and unassigned purnesh42H Mar 12, 2025
@purnesh42H purnesh42H force-pushed the generic-xds-client-transport-channel-ads-stream branch from 538590e to f647a58 Compare March 12, 2025 15:05
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! Please wait for @easwars to do a pass, too.

@dfawley dfawley removed their assignment Mar 12, 2025
func newXDSChannel(opts xdsChannelOpts) (*xdsChannel, error) {
switch {
case opts.transport == nil:
return nil, errors.New("xdsChannel: transport is nil")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think about the error prefix used in these errors.

  • does xdsChannel make any sense to the user?
  • how does this error appear to the user? what context does the caller of this function add to the error.
  • do these error strings provide any actionable feedback to the user?

Earlier when this was all part of the internal xds client, these things did not matter so much. But now, if we are propagating errors to the users, they should be meaningful and actionable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added xdsclient as the prefix because that's the top level package name. I think message is straightforward?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user sees an error message xdsChannel: transport is nil, how will they go about handling it. This function newXDSChannel is called by internal code anyways. What I'm trying to say that if this is an internal API which is only called by code that we control, we should either ensure that it is always called with all parameters filled in, or return errors that are meaningful to the user and actionable as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is an internal API which is only called by code that we control, we should either ensure that it is always called with all parameters filled in, or return errors that are meaningful to the user and actionable as well.

Yeah so in the next PR when we implement the top level xDS client constructor, we can ensure that and return actionable error message

@easwars easwars assigned purnesh42H and unassigned easwars Mar 15, 2025
@purnesh42H purnesh42H force-pushed the generic-xds-client-transport-channel-ads-stream branch from b1fc3cb to 902e6d9 Compare March 16, 2025 20:45
@purnesh42H purnesh42H force-pushed the generic-xds-client-transport-channel-ads-stream branch from 902e6d9 to 1d347b6 Compare March 16, 2025 20:45
@purnesh42H purnesh42H requested a review from easwars March 17, 2025 16:16
@purnesh42H purnesh42H assigned easwars and unassigned purnesh42H Mar 17, 2025
//
// This is kept in internal until the gRPC project decides whether or not to
// This is kept in internal until the clients project decides whether or not to
// allow alternative backoff strategies.
package backoff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If DefaultExponential is the only thing that is being used by the code, that should be the only thing exported. Everything else can be unexported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doug and I ran into this file (in our current codebase) as part of something else, and thought it was good idea to not do this anymore. So, please remove this file and just do a net.Listen("tcp", "localhost:0") wherever you want to start a new listener.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// ResourceWatchStateForTesting returns the ResourceWatchState for the given
// resource type and name. This is intended for testing purposes only, to
// inspect the internal state of the ADS stream.
func (s *adsStreamImpl) ResourceWatchStateForTesting(typ ResourceType, resourceName string) (resourceWatchState, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused. Can it be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed


// processClientSideListener checks if the provided Listener proto meets
// the expected criteria. If so, it returns a non-empty routeConfigName.
func processClientSideListener(lis *v3listenerpb.Listener) (*listenerUpdate, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is very little detail in that bug. I personally wouldn't remember what I would need to remember based on that one-liner in a couple of weeks from now.

@easwars easwars assigned purnesh42H and unassigned easwars Mar 17, 2025
Comment on lines 38 to 40
Multiplier float64
// Jitter is the factor with which backoffs are randomized.
Jitter float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any validation for Multiplier and Jitter, it is unlikely, but could a user pass in a negative or zero value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have unexported them. Also, this package is internal only so user won't have access to them.

// not actually validating what the server sent and therefore don't
// know that it's invalid. But we shouldn't ACK either, because we
// don't know that it is valid.
s.logger.Warningf("%v", nackErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should have a more detailed warning message regarding the server sending something the xDS client does not know about.

Copy link
Contributor Author

@purnesh42H purnesh42H Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the nackErr will contain the reason as well since there can be multiple reasons?

// If there are any errors decoding the resources, the metadata will indicate
// that the update was NACKed, and the returned error will contain information
// about all errors encountered by this function.
func decodeResponse(opts *DecodeOptions, rType *ResourceType, resp response) (map[string]dataAndErrTuple, xdsresource.UpdateMetadata, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this function be thread-safe? My understanding is that this would require synchronization by the caller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah ResourceType.Decoder needs to be thread safe as well for the function to be thread safe

Comment on lines +58 to +59
// Only the long form URL, with ://, is valid.
return &Name{ID: name}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think there should be a warning level log for this case. Ditto for similar if branches below where we just "silently fallback" and set the field ID to the input.

If the name isn't a valid new-style xDS name, field ID is set to the input. Note that this is not an error, because we still support the old-style resource names (those not starting with "xdstp:").

Copy link
Contributor Author

@purnesh42H purnesh42H Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah as the docstring pointed its not an error. Besides, this is an internal helper function so may be the caller should ensure logging if required?

// The caller can tell if the parsing is successful by checking the returned Scheme.

@purnesh42H purnesh42H force-pushed the generic-xds-client-transport-channel-ads-stream branch from f00537a to 5d6064c Compare March 18, 2025 05:13
@purnesh42H purnesh42H force-pushed the generic-xds-client-transport-channel-ads-stream branch from 5d6064c to 78fe33f Compare March 18, 2025 05:58
@purnesh42H purnesh42H merged commit 1703656 into grpc:master Mar 18, 2025
15 checks passed
janardhanvissa pushed a commit to janardhanvissa/grpc-go that referenced this pull request Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants