Skip to content

Conversation

WangGuangxin
Copy link
Contributor

@WangGuangxin WangGuangxin commented Sep 9, 2025

What changes were proposed in this pull request?

Currently, python connect client doen't support remote address with ipv6 format. For example

bin/pyspark --remote "sc://[xxxx:xxxx:xx:xxx::xx]:yyyy" 

It will throw

pyspark.errors.exceptions.base.PySparkValueError: [INVALID_CONNECT_URL] Invalid URL for Spark Connect: Target destination '[xxxx:xxxx:xx:xxx::xx]:yyyy' should match the '<host>:<port>' pattern. Please update the destination to follow the correct format, e.g., 'hostname:port'.

Why are the changes needed?

Make spark connect supports remote address with ipv6 format

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manually.

My test Env:

Spark 4.0
Python 3.10.18
 and some core pip packages are
    grpcio                        1.74.0
    pandas                        2.2.3

My test step

  1. First start spark connect server by sbin/start-connect-server.sh on an ipv6 host
  2. Connect it using python client with command bin/pyspark --remote "sc://[$IPV6_ADDRESS]:$PORT"
image
  1. Connect it using scala client with command bin/spark-shell --remote "sc://[$IPV6_ADDRESS]:$PORT"
image

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for testing on IPv6 and making a PR, @WangGuangxin .

After this PR, everything works correctly on your IPv6 environment? Could you elaborate the environment and the manual procedure you did?

How was this patch tested?

Manually

@WangGuangxin
Copy link
Contributor Author

@dongjoon-hyun Thanks for you review. Yes, it currently works well with this patch in my env. And I'v update the env detail and test step.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. How about non-Python client? Could you check Scala client too and revise the PR description, @WangGuangxin ?

@WangGuangxin
Copy link
Contributor Author

Thank you. How about non-Python client? Could you check Scala client too and revise the PR description, @WangGuangxin ?

@dongjoon-hyun Scala client already works well, I'v checked and updated the PR description

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-53529][CONNECT] Fix spark connect client doesn't support ipv6 address [SPARK-53529][CONNECT] Fix pyspark connect client to support IPv6 Sep 12, 2025
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-53529][CONNECT] Fix pyspark connect client to support IPv6 [SPARK-53529][PYTHON][CONNECT] Fix pyspark connect client to support IPv6 Sep 12, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @WangGuangxin .

cc @grundprinzip and @hvanhovell , too.

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.1.0 (and 4.1.0-preview2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants