Skip to content

Conversation

@fantiq
Copy link
Contributor

@fantiq fantiq commented Jul 29, 2025

What is the purpose of the change?

resolve #15271 #15003 , this issues are related to the config option registry.check.

When registry.check=true (default value is true) is set, the connection status will be checked when creating the registry client. if the check fails, an exception will be thrown.
when registry.check=false is set, no check is performed and no exception is ever thrown.

The reason for NEP is caused by setting the option registry.check to false and throwning an exception.
so here when registry.check=false no exception should be thrown. that is, the connection status should not be checked.

Currently, in the nacos client, the option nacos.check is used to determine whether to check the connection status or not.
the zookeeper client does not yet support not checking the connection status.

This PR add an new option zookeeper.check, it is similar to option nacos.check.

When creating a registry client, it will first be created using the default behavior. if an exception is thrown and registry.check=false,
it will try to create the registry client again by turning on the option (nacos.check=false or zookeeper.check=false) to not check the connection status.

Checklist

  • Make sure there is a GitHub_issue field for the change.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test to verify your logic correction. If the new feature or significant change is committed, please remember to add sample in dubbo samples project.
  • Make sure gitHub actions can pass. Why the workflow is failing and how to fix it?

@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2025

Codecov Report

❌ Patch coverage is 64.28571% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.07%. Comparing base (dea0206) to head (fb1ed51).
⚠️ Report is 2 commits behind head on 3.3.

Files with missing lines Patch % Lines
...apache/dubbo/registry/retry/AbstractRetryTask.java 0.00% 1 Missing and 1 partial ⚠️
...o/registry/nacos/util/NacosNamingServiceUtils.java 0.00% 1 Missing and 1 partial ⚠️
...n/java/org/apache/dubbo/common/utils/UrlUtils.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##                3.3   #15594      +/-   ##
============================================
  Coverage     61.06%   61.07%              
+ Complexity    11700       15   -11685     
============================================
  Files          1924     1924              
  Lines         87086    87096      +10     
  Branches      13115    13118       +3     
============================================
+ Hits          53182    53192      +10     
  Misses        28460    28460              
  Partials       5444     5444              
Flag Coverage Δ
integration-tests-java21 32.98% <28.57%> (+0.13%) ⬆️
integration-tests-java8 33.07% <28.57%> (+0.12%) ⬆️
samples-tests-java21 32.63% <28.57%> (+0.05%) ⬆️
samples-tests-java8 30.35% <28.57%> (+0.01%) ⬆️
unit-tests-java11 59.02% <64.28%> (-0.02%) ⬇️
unit-tests-java17 58.75% <64.28%> (-0.02%) ⬇️
unit-tests-java21 58.78% <64.28%> (-0.02%) ⬇️
unit-tests-java8 59.03% <64.28%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@fantiq
Copy link
Contributor Author

fantiq commented Jul 30, 2025

@RainYuY @zrlw PTAL

throw e;
}
nacosConnectionManager =
new NacosConnectionManager(connectionURL, false, retryTimes, sleepMsBetweenRetries);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change might cause some problems. This registry isn’t functioning normally, but it's still added to the RegistryManager. As a result, subsequent actions might use this registry, which could lead to more errors. I'm not sure exactly what will happen after this PR is merged, but I’ll try to reproduce the issue later.

Copy link
Contributor

@RainYuY RainYuY Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After trying it out, I got the result.

23:41:19.981 |-WARN  [DubboRegistryRetryTimer-thread-1] registry.nacos.NacosNamingServiceWrapper:    -|  [DUBBO] Failed to request nacos naming server. Dubbo will try to retry in 10. Try times: 6, dubbo version: 3.3.6-SNAPSHOT, current host: 192.168.31.224, error code: 1-37. This may be caused by , go to https://dubbo.apache.org/faq/1/37 to find instructions. 
com.alibaba.nacos.api.exception.NacosException: Client not connected, current status:STARTING
	at com.alibaba.nacos.common.remote.client.RpcClient.request(RpcClient.java:645) ~[nacos-client-2.5.1.jar:?]
	at com.alibaba.nacos.common.remote.client.RpcClient.request(RpcClient.java:624) ~[nacos-client-2.5.1.jar:?]
	at com.alibaba.nacos.client.naming.remote.gprc.NamingGrpcClientProxy.requestToServer(NamingGrpcClientProxy.java:449) ~[nacos-client-2.5.1.jar:?]
	at com.alibaba.nacos.client.naming.remote.gprc.NamingGrpcClientProxy.doRegisterService(NamingGrpcClientProxy.java:252) ~[nacos-client-2.5.1.jar:?]
	at com.alibaba.nacos.client.naming.remote.gprc.NamingGrpcClientProxy.registerServiceForEphemeral(NamingGrpcClientProxy.java:147) ~[nacos-client-2.5.1.jar:?]
	at com.alibaba.nacos.client.naming.remote.gprc.NamingGrpcClientProxy.registerService(NamingGrpcClientProxy.java:138) ~[nacos-client-2.5.1.jar:?]
	at com.alibaba.nacos.client.naming.remote.NamingClientProxyDelegate.registerService(NamingClientProxyDelegate.java:96) ~[nacos-client-2.5.1.jar:?]
	at com.alibaba.nacos.client.naming.NacosNamingService.registerInstance(NacosNamingService.java:161) ~[nacos-client-2.5.1.jar:?]
	at org.apache.dubbo.registry.nacos.NacosNamingServiceWrapper.lambda$registerInstance$5(NacosNamingServiceWrapper.java:133) ~[dubbo-3.3.6-SNAPSHOT.jar:3.3.6-SNAPSHOT]
	at org.apache.dubbo.registry.nacos.NacosNamingServiceWrapper.accept(NacosNamingServiceWrapper.java:495) ~[dubbo-3.3.6-SNAPSHOT.jar:3.3.6-SNAPSHOT]
	at org.apache.dubbo.registry.nacos.NacosNamingServiceWrapper.registerInstance(NacosNamingServiceWrapper.java:133) ~[dubbo-3.3.6-SNAPSHOT.jar:3.3.6-SNAPSHOT]
	at org.apache.dubbo.registry.nacos.NacosRegistry.doRegister(NacosRegistry.java:205) ~[dubbo-3.3.6-SNAPSHOT.jar:3.3.6-SNAPSHOT]
	at org.apache.dubbo.registry.retry.FailedRegisteredTask.doRetry(FailedRegisteredTask.java:33) ~[dubbo-3.3.6-SNAPSHOT.jar:3.3.6-SNAPSHOT]
	at org.apache.dubbo.registry.retry.AbstractRetryTask.run(AbstractRetryTask.java:127) ~[dubbo-3.3.6-SNAPSHOT.jar:3.3.6-SNAPSHOT]
	at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:654) ~[dubbo-3.3.6-SNAPSHOT.jar:3.3.6-SNAPSHOT]
	at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:733) ~[dubbo-3.3.6-SNAPSHOT.jar:3.3.6-SNAPSHOT]
	at org.apache.dubbo.common.timer.HashedWheelTimer$Worker.run(HashedWheelTimer.java:455) ~[dubbo-3.3.6-SNAPSHOT.jar:3.3.6-SNAPSHOT]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]

It will return to normal once the Nacos server is back online.
It's different from metadata or the config center — I think they should follow the same behavior.
This needs some discussion.

@zrlw @AlbumenJ @oxsean @heliang666s

Copy link
Contributor

@RainYuY RainYuY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Please take a look at these comments.

@fantiq
Copy link
Contributor Author

fantiq commented Aug 1, 2025

Thank you very mush for your preview.

In the zookeeper registry client, i have modified it to use registry.check directly instead of introducing new specific options.

Can it be considered that nacos.check actually replaces the function of registry.check?
So here i only need to be compatible with this situation which led to #15003.
When registry.check = false and nacos.check = true, the value of nacos.check should be false actually.

And then i will evaluate the impact of this change on the config center and metadata center.

@RainYuY @zrlw need more review, thx

@fantiq fantiq requested review from RainYuY and zrlw August 1, 2025 06:48
zrlw
zrlw previously approved these changes Aug 1, 2025
@RainYuY
Copy link
Contributor

RainYuY commented Aug 1, 2025

Thank you very mush for your preview.

In the zookeeper registry client, i have modified it to use registry.check directly instead of introducing new specific options.

Can it be considered that nacos.check actually replaces the function of registry.check? So here i only need to be compatible with this situation which led to #15003. When registry.check = false and nacos.check = true, the value of nacos.check should be false actually.

And then i will evaluate the impact of this change on the config center and metadata center.

@RainYuY @zrlw need more review, thx

The modification is OK, and I even think the nacos.check configuration item could be removed in the future.

Also, what I mentioned earlier was not that this PR affects the config center or metadata, but that their behavior is inconsistent. In the current implementation of this PR, even if the check fails, the registry is still initialized. As a result, the service keeps trying to register to an unavailable registry.

I think a better approach would be not to repeatedly attempt registration when the registry is unavailable. Instead, we should first check the availability of the registry, and only proceed with registration once it becomes available.

@fantiq
Copy link
Contributor Author

fantiq commented Aug 4, 2025

Also, what I mentioned earlier was not that this PR affects the config center or metadata, but that their behavior is inconsistent

@RainYuY Your focus is right. I have checked the startup process of the config-center and metadata. they both support the check option and need to be modified to support the logic of not checking the connection status.
I will continue to modify these later.

the service keeps trying to register to an unavailable registry

About this situation, is it possible to add connection status check logic before register?And the retry task should be retained.

@RainYuY
Copy link
Contributor

RainYuY commented Aug 4, 2025

@RainYuY Your focus is right. I have checked the startup process of the config-center and metadata. they both support the check option and need to be modified to support the logic of not checking the connection status. I will continue to modify these later.

I think there will be a lot of work to do. The best practice might be to open a new PR to reproduce it.

the service keeps trying to register to an unavailable registry

About this situation, is it possible to add connection status check logic before register?And the retry task should be retained.

Yes, I think this is the correct action.And it should be solved at this PR.

@fantiq
Copy link
Contributor Author

fantiq commented Aug 4, 2025

I think there will be a lot of work to do. The best practice might be to open a new PR to reproduce it.

@RainYuY It's a wise decision to create a new issue. In addition to making the ConfigCenter and MetadataCenter support check=false, we also need to consider the processing logic when the connection is recovery, this reqires more discussion.
I will create a new issue later.

@RainYuY
Copy link
Contributor

RainYuY commented Aug 4, 2025

LGTM

RainYuY
RainYuY previously approved these changes Aug 4, 2025
@zrlw
Copy link
Contributor

zrlw commented Sep 18, 2025

duplicated with #15639

@zrlw zrlw closed this Sep 18, 2025
@RainYuY RainYuY reopened this Oct 1, 2025
@RainYuY
Copy link
Contributor

RainYuY commented Oct 1, 2025

@fantiq cc conflicts

RainYuY
RainYuY previously approved these changes Oct 9, 2025
Copy link
Contributor

@RainYuY RainYuY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RainYuY RainYuY requested a review from oxsean October 9, 2025 05:33
@RainYuY
Copy link
Contributor

RainYuY commented Oct 17, 2025

@AlbumenJ @oxsean @zrlw PTAL

Copy link
Contributor

@zrlw zrlw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RainYuY RainYuY merged commit f2d5ba9 into apache:3.3 Oct 20, 2025
60 of 92 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Refactor Registry Implement

4 participants