Skip to content

Conversation

satheeshaGowda
Copy link

Problem Statement

Currently, we use slot ranges to uniquely identify the shards in the cluster, which has been working, but with the split ranges, it gets ugly and present few unnecessary challenges to maintain our own shardId in Valkey control plane and client library.

Example:

shardId:0-999_2001-3999_4501-5460


1) 1) "slots"
   2) 1) (integer) 0
      2) (integer) 999
      3) (integer) 2001
      4) (integer) 3999
      5) (integer) 4501
      6) (integer) 5460
   3) "nodes"
   4) 1)  1) "id"
          2) "6e76043bed00e716e85035107866ea16e9a5f700"
          3) "port"
          4) (integer) 6385
          5) "ip"
          6) "127.0.0.1"
          7) "endpoint"
          8) "127.0.0.1"
          9) "role"
         10) "replica"
         11) "replication-offset"
         12) (integer) 8092
         13) "health"
         14) "online"
      2)  1) "id"
          2) "b2f8c841707b2246ec2a641c37f16e88fe0bb700"
          3) "port"
          4) (integer) 6380
          5) "ip"
          6) "127.0.0.1"
          7) "endpoint"
          8) "127.0.0.1"
          9) "role"
         10) "master"
         11) "replication-offset"
         12) (integer) 8092
         13) "health"
         14) "online"

one might argue that , It's possible to get the shard id using CLUSTER MYSHARDID , but this needs to be executed on each node in the cluster, which is unnecessary overhead.

Proposed Solution

This change exposes an existing, persistent (in nodes.conf) unique shard Id for each shard in the cluster as part of the CLUSTER SHARDS command response.

1) 1) "slots"
   2) 1) (integer) 0
      2) (integer) 999
      3) (integer) 2001
      4) (integer) 3999
      5) (integer) 4501
      6) (integer) 5460
   3) "nodes"
   4) 1)  1) "id"
          2) "6e76043bed00e716e85035107866ea16e9a5f700"
          3) "port"
          4) (integer) 6385
          5) "ip"
          6) "127.0.0.1"
          7) "endpoint"
          8) "127.0.0.1"
          9) "role"
         10) "replica"
         11) "replication-offset"
         12) (integer) 8092
         13) "health"
         14) "online"
      2)  1) "id"
          2) "b2f8c841707b2246ec2a641c37f16e88fe0bb700"
          3) "port"
          4) (integer) 6380
          5) "ip"
          6) "127.0.0.1"
          7) "endpoint"
          8) "127.0.0.1"
          9) "role"
         10) "master"
         11) "replication-offset"
         12) (integer) 8092
         13) "health"
         14) "online"
   5) "shard"
   6) 1) "id"
      2) "3f2a7bb7bbd5fc2a331fe9bf95f5e02bcca02430"

This has several key benefits ..

Simplified and More Robust Client-Side Logic: Clients can now use the shard_id to build a more resilient and accurate internal representation of the cluster topology. This simplifies the logic required to handle MOVED redirections and other cluster state changes, as clients can reliably map slots to a consistent shard identity.

Improved Observability and Control Plane: Management and monitoring tools can leverage the shard_id to track shard-level performance metrics, configuration history, and health status over time, regardless of which node is the current primary.

Alternatives you've considered:

  • Include shard Id in the slots section , this could potentially break backward compatibility , depending upon how clients are parsing this response.

1) 1) "slots"
   2) 1) (integer) 0
      2) (integer) 999
      3) (integer) 2001
      4) (integer) 3999
      5) (integer) 4501
      6) (integer) 5460
      7) "id"
      8) "3f2a7bb7bbd5fc2a331fe9bf95f5e02bcca02430"
  • NOT including this change in CLUSTER SLOTS response, since it is tagged for deprecation in favor of CLUSTER SHARDS

Signed-off-by: Satheesha Chattenahalli Hanume Gowda <[email protected]>
addNodeDetailsToShardReply(c, n);
clusterFreeNodesSlotsInfo(n);
}
addReplyBulkCString(c, "shard");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, "shards" is semantically more cohesive with "nodes" and "slots"?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although being cohesive, shards is a bit misleading because there should be on 1 shard per entry of cluster shards command. shard is more appropriate because it reflects that there is only 1 shard in this entry.

Copy link

@nilanshu-sharma nilanshu-sharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

addNodeDetailsToShardReply(c, n);
clusterFreeNodesSlotsInfo(n);
}
addReplyBulkCString(c, "shard");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although being cohesive, shards is a bit misleading because there should be on 1 shard per entry of cluster shards command. shard is more appropriate because it reflects that there is only 1 shard in this entry.

@hpatro hpatro self-requested a review September 8, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants