-
Notifications
You must be signed in to change notification settings - Fork 4.5k
xds: generic xDS client transport channel and ads stream implementation #8144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds: generic xDS client transport channel and ads stream implementation #8144
Conversation
cfedc55
to
957b0ae
Compare
957b0ae
to
123a5a5
Compare
a349898
to
bf86ee7
Compare
bf86ee7
to
03196ed
Compare
4d7d9ae
to
fba2d82
Compare
@@ -16,17 +16,17 @@ | |||
* | |||
*/ | |||
|
|||
// Package grpclog defines logging for grpc. | |||
// Package clientslog defines logging for clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should see if there's something better to do for logging. What about slog
in the stdlib?
Our only real constraint is that whatever we do needs to work with the existing grpclog stuff when we use these from grpc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a blocker for PR or it can be tried after e2e working generic xDS client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Estimate work needed to get a standard library like slog working
// | ||
// This is kept in internal until the gRPC project decides whether or not to | ||
// This is kept in internal until the clients project decides whether or not to | ||
// allow alternative backoff strategies. | ||
package backoff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global concern: are we sure we're not pulling in dead code as part of this? AIUI the existing dead code checker only considers unexported functions/symbols.
Maybe we can scale some of these things back to be simpler & smaller given the more limited usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed 2 commits of removing unused code from common helpers and xdsclient helpers respectively. That has reduced code by around 3000 lines. I think the helpers can still be simplified but I think that will require some refactoring/rewriting. Can we do that as separate PR once we have a working client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If DefaultExponential
is the only thing that is being used by the code, that should be the only thing exported. Everything else can be unexported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Unexported other structs.
@@ -1,6 +1,6 @@ | |||
/* | |||
* | |||
* Copyright 2019 gRPC authors. | |||
* Copyright 2025 gRPC authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's appropriate to change the copyright date on all these files. Let's just leave them alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, these are new files because they are copied to new folder. So, won't they need new date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should see if we really need this and kill it otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its mainly for debugging. If we remove this, we can print the response or resource directly using %v
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
@@ -21,7 +21,7 @@ | |||
package httpfilter | |||
|
|||
import ( | |||
iresolver "google.golang.org/grpc/internal/resolver" | |||
iresolver "google.golang.org/grpc/xds/internal/clients/xdsclient/internal/testutils/resolver" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need name resolver stuff in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed as part of unused code
ResourceTypeMapForTesting = make(map[string]any) | ||
ResourceTypeMapForTesting[xdsresource.V3ListenerURL] = listenerType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified:
ResourceTypeMapForTesting = map[string]any{xdsresource.V3ListenerURL: listenerType}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this makes sense in the context of an xdsclient, and it should be removed. This is all grpc-specific definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed as part of unused code
@@ -16,12 +16,12 @@ | |||
* | |||
*/ | |||
|
|||
package grpcsync | |||
package clientssync |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe syncutil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Note that this differs from FailedPrecondition. It indicates arguments | ||
// that are problematic regardless of the state of the system | ||
// (e.g., a malformed file name). | ||
invalidArgumentErrCode = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leverage existing symbols instead?
https://pkg.go.dev/google.golang.org/genproto/googleapis/rpc/code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
*/ | ||
|
||
// Package xdsclient contains implementation of the xDS client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an xdsclient subdirectory inside xdsclient? And it's not internal?
A goal of this change is to add no new exported symbols at all, or to keep that set very small and well understood. That includes new packages not under internal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i had to move the implementation to a separate package from config structs to avoid cyclic dependency from internal packages which needs xdsclient config types. With this structure, to create a new client, user will have to import .....clients\xdsclient\xdsclient
@@ -62,7 +62,7 @@ func newLoggerV2() LoggerV2 { | |||
warningW := io.Discard | |||
infoW := io.Discard | |||
|
|||
logLevel := os.Getenv("GRPC_GO_LOG_SEVERITY_LEVEL") | |||
logLevel := os.Getenv("CLIENTS_GO_LOG_SEVERITY_LEVEL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, what would be the migration strategy here for customers/users internally and externally?
All users would have to update to this new environment variable in order to preserve the user-specified log levels. Would this be fine and/or feasible? Similarly, for all other environment variable changes.
Would there be an overall migration/import plan and strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a completely new folder in the grpc-go. Eventually this entire code is planned to move to a new repo. There is no change being done to existing code of internal xds client implementation so unless the user decides to switch to this new implementation, they will not notice a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So both versions would co-exist for some time? If so, how long until we move the default to the new implementation?
Perhaps an overall migration and/or user adoption strategy should be documented in the design doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there are 2 things to note. If you are using internal xds client and don't have a requirement of not having grpc and proto dependencies, you can continue to use the internal xds client and you should not notice any behaviour change. However, if you want to configure your own transport and can't use proto based API(s), it would be good to switch to generic xDS client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So both versions would co-exist for some time? If so, how long until we move the default to the new implementation?
it has to be fast follow up. Not more than 2-3 weeks of generic xDS client completion.
if sc.IdentityInstanceName != "" { | ||
if _, ok := bc.CertProviderConfigs()[sc.IdentityInstanceName]; !ok { | ||
return fmt.Errorf("identity certificate provider instance name %q missing in bootstrap configuration", sc.IdentityInstanceName) | ||
} | ||
} | ||
if sc.RootInstanceName != "" { | ||
if _, ok := bc.CertProviderConfigs()[sc.RootInstanceName]; !ok { | ||
return fmt.Errorf("root certificate provider instance name %q missing in bootstrap configuration", sc.RootInstanceName) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my education, why were these security checks deleted? Would it be users could now add their own generic security configs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these files are just copied from internal xdsclient implementation to testutils of external xds client for testing. We don't have to do this kind of testing for external xdsclient so after copy, I have removed it. The only testing required for external client is resource watching scenarios.
@@ -126,43 +128,29 @@ func newXDSChannel(opts xdsChannelOpts) (*xdsChannel, error) { | |||
type xdsChannel struct { | |||
// The following fields are initialized at creation time and are read-only | |||
// after that, and hence need not be guarded by a mutex. | |||
transport transport.Transport // Takes ownership of this transport (used to make streaming calls). | |||
ads *ads.StreamImpl // An ADS stream to the management server. | |||
lrs *lrs.StreamImpl // An LRS stream to the management server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm for my understanding, this is because we are moving the LRS client functionality away from the xDS client into its own separate LRS client? (related to #8042).
Currently, in the immediate-term, we have no need for LRS, but we would need it in the future. So this is still a very nice to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's correct. For external client, LRS will have a separate client.
538590e
to
f647a58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! Please wait for @easwars to do a pass, too.
func newXDSChannel(opts xdsChannelOpts) (*xdsChannel, error) { | ||
switch { | ||
case opts.transport == nil: | ||
return nil, errors.New("xdsChannel: transport is nil") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to think about the error prefix used in these errors.
- does
xdsChannel
make any sense to the user? - how does this error appear to the user? what context does the caller of this function add to the error.
- do these error strings provide any actionable feedback to the user?
Earlier when this was all part of the internal xds client, these things did not matter so much. But now, if we are propagating errors to the users, they should be meaningful and actionable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added xdsclient as the prefix because that's the top level package name. I think message is straightforward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user sees an error message xdsChannel: transport is nil
, how will they go about handling it. This function newXDSChannel
is called by internal code anyways. What I'm trying to say that if this is an internal API which is only called by code that we control, we should either ensure that it is always called with all parameters filled in, or return errors that are meaningful to the user and actionable as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is an internal API which is only called by code that we control, we should either ensure that it is always called with all parameters filled in, or return errors that are meaningful to the user and actionable as well.
Yeah so in the next PR when we implement the top level xDS client constructor, we can ensure that and return actionable error message
b1fc3cb
to
902e6d9
Compare
902e6d9
to
1d347b6
Compare
// | ||
// This is kept in internal until the gRPC project decides whether or not to | ||
// This is kept in internal until the clients project decides whether or not to | ||
// allow alternative backoff strategies. | ||
package backoff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If DefaultExponential
is the only thing that is being used by the code, that should be the only thing exported. Everything else can be unexported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doug and I ran into this file (in our current codebase) as part of something else, and thought it was good idea to not do this anymore. So, please remove this file and just do a net.Listen("tcp", "localhost:0")
wherever you want to start a new listener.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// ResourceWatchStateForTesting returns the ResourceWatchState for the given | ||
// resource type and name. This is intended for testing purposes only, to | ||
// inspect the internal state of the ADS stream. | ||
func (s *adsStreamImpl) ResourceWatchStateForTesting(typ ResourceType, resourceName string) (resourceWatchState, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unused. Can it be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
|
||
// processClientSideListener checks if the provided Listener proto meets | ||
// the expected criteria. If so, it returns a non-empty routeConfigName. | ||
func processClientSideListener(lis *v3listenerpb.Listener) (*listenerUpdate, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is very little detail in that bug. I personally wouldn't remember what I would need to remember based on that one-liner in a couple of weeks from now.
Multiplier float64 | ||
// Jitter is the factor with which backoffs are randomized. | ||
Jitter float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be any validation for Multiplier
and Jitter
, it is unlikely, but could a user pass in a negative or zero value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have unexported them. Also, this package is internal only so user won't have access to them.
// not actually validating what the server sent and therefore don't | ||
// know that it's invalid. But we shouldn't ACK either, because we | ||
// don't know that it is valid. | ||
s.logger.Warningf("%v", nackErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this should have a more detailed warning message regarding the server sending something the xDS client does not know about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the nackErr will contain the reason as well since there can be multiple reasons?
// If there are any errors decoding the resources, the metadata will indicate | ||
// that the update was NACKed, and the returned error will contain information | ||
// about all errors encountered by this function. | ||
func decodeResponse(opts *DecodeOptions, rType *ResourceType, resp response) (map[string]dataAndErrTuple, xdsresource.UpdateMetadata, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this function be thread-safe? My understanding is that this would require synchronization by the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ResourceType.Decoder needs to be thread safe as well for the function to be thread safe
// Only the long form URL, with ://, is valid. | ||
return &Name{ID: name} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think there should be a warning level log for this case. Ditto for similar if branches below where we just "silently fallback" and set the field ID
to the input.
If the name isn't a valid new-style xDS name, field ID is set to the input. Note that this is not an error, because we still support the old-style resource names (those not starting with "xdstp:").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah as the docstring pointed its not an error. Besides, this is an internal helper function so may be the caller should ensure logging if required?
// The caller can tell if the parsing is successful by checking the returned Scheme.
f00537a
to
5d6064c
Compare
5d6064c
to
78fe33f
Compare
This is the xDS transport channel implementation change that setup an ADS (Aggregated Data Service) stream with xDS management server which can be used by generic xDS client to do resource watching. The ADS stream and xDS channel uses the types and interfaces that were created in #8042. The e2e tests are written using types and interfaces in #8042 and grpctransport created in #8066.
The PR copies the existing ADS stream and channel implementation from internal xdsclient code along with the helpers that are part of gRPC and then modify them to use the generic client types and interfaces. The PR contains 6 commits. Odd commits (1, 3, 5) are doing the exact copy and even commits (2, 4, 6) are modifying their predecessor commit. The copy index is as follows
Copied Implementation (modified to use generic client configs and interfaces)
xds/internal/xdsclient/transport/ads
xds/internal/clients/xdsclient/internal/ads
xds/internal/xdsclient/channel.go
xds/internal/clients/xdsclient/xdsclient/channel.go
xds/internal/xdsclient/channel_test.go
xds/internal/clients/xdsclient/xdsclient/channel_test.go
Copied helpers common (mainly for testing and logging. only what is needed)
internal/backoff
xds/internal/clients/internal/backoff
internal/buffer
xds/internal/clients/internal/buffer
internal/pretty
xds/internal/clients/internal/pretty
internal/testutils/xds/e2e
xds/internal/clients/internal/testutils/e2e
internal/testutils/xds/fakeserver
xds/internal/clients/internal/testutils/fakeserver
internal/testutils/channel.go
xds/internal/clients/internal/testutils/channel.go
Copied helpers xdsclient (mainly for testing and logging. only what is needed)
xds/internal/xdsclient/xdsresource/errors.go
xds/internal/clients/xdsclient/internal/xdsresource/errors.go
xds/internal/xdsclient/xdsresource/name.go
xds/internal/clients/xdsclient/internal/xdsresource/name.go
xds/internal/xdsclient/xdsresource/type.go
xds/internal/clients/xdsclient/internal/xdsresource/type.go
xds/internal/xdsclient/xdsresource/version/version.go
xds/internal/clients/xdsclient/internal/xdsresource/version.go
RELEASE NOTES: None