DOC-13173 Enhance documentation for Index Rebalance and Upgrade #3805

rao-shwe · 2025-05-09T10:12:16Z

(Currently, I've set ToC/Headings on the RHS to level-3 for your convenience. It looks crowded, so I'll reduce it to 2 before publishing.)

Please ignore the preview yml file.

Nischal1729 · 2025-11-26T16:58:27Z

Swap rebalance is not an index rebalance method like DCP/ file based, can be excluded from here: https://preview.docs-test.couchbase.com/docs-server-DOC_13173_Rebalance_doc_enhancements/server/current/learn/clusters-and-availability/rebalance-and-index-service.html#index-rebalance-methods

NightWing1998 · 2025-11-26T17:49:17Z

In Couchbase Server 7.6 and later versions, shard affinity is supported.

can we rephrase this as "index-shard affinity is introduced"?

Standard Rebalance: This is the default method, which is the DCP-based method.

this is only true for on-prem deployments. For Capella, the default is FBR. I'm not sure if we are using the same for Server and Capella.

Shard Affinity: It decides shard placement, so index partitions share the same shard ID.

index partitions may not share same shard ID. Partition and its replicas share same slot ID.

https://preview.docs-test.couchbase.com/docs-server-DOC_13173_Rebalance_doc_enhancements/server/current/learn/clusters-and-availability/rebalance-and-index-service.html#fbr-in-mixed-version-clusters

in mixed versions, we run FBR for provisioned clusters so that all 7.6 clusters have shard affinity by default.

Source or destination node runs a version earlier than 7.6 | Both nodes must run Couchbase Server 7.6 or a later version for FBR.

like mentioned above, mixed mode functionality is supported for provisioned clusters

rao-shwe · 2025-11-27T07:42:00Z

@Nischal1729,
I've implemented your review comments. I've relocated Swap Rebalance info and removed its mention from Index Rebalance methods list.

@NightWing1998
I've implemented your initial 3 review comments. Rewrote the content for correctness and clarity.
I've skipped the last 2 review comments that are for Capella, as this content is for self-managed version.

rao-shwe · 2025-11-28T10:16:30Z

@shivanshrustagi
I've implemented your responses to my questions and your yesterday's comments from google docs.

rao-shwe · 2025-11-28T11:16:40Z

All discussed contents, then the ones shared by Nischal and Shivansh, including the responses to my questions (excluding 2 to 3) and review comments, all are documented.

amithk · 2025-11-26T16:04:33Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+NOTE: If at least one node in the cluster is running Server 7.1 or a later version, most smart batching features apply across the cluster,
+even when some nodes are running earlier versions.
+
+==== Empty Node Batching


We should add explanation in case rebalance is cancelled / failed and then retried. In such cases, it won't restart from previously known state, and index rebalance may choose to move indexes across all nodes.

CC @Nischal1729

amithk · 2025-11-26T16:06:01Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+=== Swap Rebalance for Index Service
+
+A swap rebalance removes one or more source nodes and adds the same number of destination nodes.
+During the swap rebalance operation, indexes move only between those source and destination nodes.


As mentioned in the other review comment, it is better to mention what happens on cancelled / failed rebalance + retry.

amithk · 2025-12-01T08:10:06Z

Hi @rao-shwe,
There are a couple of scenarios, where customers feel that more indexed are moving than what was expected. It is better to document those scenarios in the notes in the corresponding sections.

For example, index rebalance failure and restart, empty-node batching is not being used, optimize index placement is enabled etc.

Let's get these topics documented and highlighted properly (), if already not.

Thanks.

CC @Nischal1729, @shivanshrustagi.

rao-shwe · 2025-12-01T18:26:16Z

@shivanshrustagi and @Nischal1729,
For Amit's all 3 latest review comments, I've created the following sections (these are preview pages):

Restarting an File-Based Rebalance
When Index Movement Appears Higher than Anticipated
Changed all the instances of "FBR" to "File-Based Rebalance".

Please review the technical correctness.

Cc: @amithk

Preview doc site login credentials.

shivanshrustagi · 2025-12-02T05:15:23Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+
+The Index Service maintains cluster-wide index definitions and metadata to <<#index-redistribution,redistribute indexes and replicas>> during rebalance operations.
+
+The rebalance operation evaluates each node's CPU, RAM, and disk bandwidth to minimize effects on database performance.


The rebalance operation evaluates each node's CPU, RAM, and disk bandwidth to minimize effects on database performance.

Maybe we can mention Availability of indexes as well.
The rebalance operation evaluates each node's CPU, RAM, disk bandwidth and user defined Server Groups(Availability Zones) to minimize effects on database performance and maximise availability of indexes

shivanshrustagi · 2025-12-02T06:18:17Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+As a result, this design provides the following benefits:
+
+* Provides deterministic collocation, where all indexes with the same shard ID share the same shard files.
+* Prevents replica-on-same-node conflicts.


While this is true, shard affinity alone did not enable this. We had support for this when replicated indexes were introduced.

@Nischal1729 @rao-shwe maybe we can add replica repair benefits.

While this is true, shard affinity alone did not enable this. We had support for this when replicated indexes were introduced.

I think context for adding these points was to emphasise why GSI has to maintain alternate shard ID etc, and plasma itself is not directly handling this assignment, so it makes sense in that context- maybe we can exclude adding these to docs or give additional context.

shivanshrustagi · 2025-12-02T06:19:56Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+
+A rebalance automatically redistributes indexes in the following situations:
+
+* *Rebalance when you add an index node*:


Only enabled by default in Capella.

On-prem customers have to enable redistribute_indexes flag

shivanshrustagi · 2025-12-02T06:22:00Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+Rebalance always moves indexes off of nodes that you're removing from the cluster to the remaining nodes.
+A rebalance does not affect indexes that reside on nodes that you're not removing. 
+
+* *Rebalance when you add and remove index nodes*:


This is technically incorrect, reditribute_indexes flag makes every index in the cluster as eligible for planning/movement.

Whenever we have any node getting removed, only the Indexes on the to-be-removed nodes are considered for planning. So for Add + Remove Index, the indexes on the Removed node gets moved to the new Node irrespective of the redistribute_indexes flag

shivanshrustagi · 2025-12-02T06:24:48Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+A retry triggers a full re-planning cycle and not a continuation.
+The new plan may choose different nodes, causing more indexes to move than in the previous attempt.
+
+* Optimize Index Placement is enabled:


Maybe add Caution/Important note, where we can see the rebalance time increasing due to tons of index movement, when flag is enabled when it need not be used

@Nischal1729

Nischal1729 · 2025-12-02T09:52:35Z

This point can be removed as it is expected that all indexes on the removed node will be moved during swap rebalance:

Swap rebalance interacting with shard groups:

Even if you expect only the swap node pair to participate, shared shards may cause all indexes in those shards to move.

amithk

I have reviewed until before the examples section. Will review examples and rest of the sections later.

amithk · 2025-12-02T13:06:31Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+
+* xref:learn:services-and-indexes/indexes/storage-modes.adoc#memory-optimized-index-storage[Memory Optimized GSI Storage], which stores most index structures in-memory and does not support file-based operations.
+
+Plasma, the storage engine for GSI, stores index data in shards.


Plasma, the storage engine for standard GSI

amithk · 2025-12-02T13:06:35Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+
+== Index Service
+
+The Index Service maintains cluster-wide index definitions and metadata to <<#index-redistribution,redistribute indexes and replicas>> during rebalance operations.


Minor: Is it too early to refer to the index-redistribution section here?

amithk · 2025-12-02T13:10:28Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+In Couchbase Server versions earlier than 7.6, Plasma automatically chose which shard an index partition belonged to.
+Couchbase Server 7.6 introduces index-shard affinity and continues to support it in later versions.
+
+With shard affinity, the GSI layer assigns a shard slot (an alternate shard ID) to each index partition and provides this information to Plasma.


This paragraph and a bullet point below talk about separate "layering" of GSI and storage. I think this needs to be reworded as such explanation is internal to the product. We should focus on explaining product behaviour without exposing internal details like, which layer implements what functionality.

amithk · 2025-12-02T13:12:33Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+* Prevents replica-on-same-node conflicts.
+* Enables efficient data movement through File-Based Rebalance.
+
+Shard affinity keeps all index partitions with the same shard ID on the same node and moves them as a single unit during index rebalancing.


@Nischal1729 is this always true? Across all permutations of partition placements and further index movements?

Wording can be clearer here that it is referring to the shard UUID by plasma and not alternate shard ID. But given that shard affinity is there and other prerequisites for shard based rebalance are there this should always be true.

amithk · 2025-12-02T13:17:58Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+====
+* File-Based Rebalance is supported only if you have enabled Standard Global Secondary xref:manage:manage-settings/general-settings.adoc#index-storage-mode[Index Storage Mode] on your cluster.
+
+* You cannot use File-Based Rebalance if you have enabled xref:learn:services-and-indexes/indexes/storage-modes.adoc#memory-optimized-index-storage[Memory Optimized Index Storage] in the xref:manage:manage-settings/general-settings.adoc#index-storage-mode[Index Storage Mode] settings on your cluster because it does not store index metadata in files.


because it does not store index metadata in files.

This needs to be reworded. In general, I don't see a need to provide reason here. We can just say: with memory optimised indexes, the file based rebalance is not supported.

amithk · 2025-12-02T13:24:14Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+.. <<#enabling-fbr,Enable File-Based Rebalance>> (or shard affinity).
+
+.. When the Rebalance operation is triggered, Couchbase Server does not use File-Based Rebalance method for those indexes right away,
+because those indexes do not have <<#shard-affinity,Shard Affinity>>.


because those indexes do not have "shard affinity metadata".

amithk · 2025-12-02T13:27:59Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+
+If a File-Based Rebalance fails, you can start a new rebalance.
+The subsequent rebalance does not repeat any shard transfers that completed successfully during the earlier attempt.
+The subsequent rebalance only transfers the indexes that were incomplete or still in progress when the failure occurred.


This doesn't seem to be correct. After restart of the rebalance, entirely new plan is generated and a new set of index movements will be attempted.

Let's double check this @Nischal1729.

It is already explained in detail below.

Yes this statement is incorrect:

The subsequent rebalance only transfers the indexes that were incomplete or still in progress when the failure occurred.

After failure/cancel the cluster reverts to normal operation with whatever placements were fully committed and unfinished moves are rolled back. When new rebalance is started, it will use a fresh plan and corresponding index movements, which are not related to the older incomplete index movements.

amithk · 2025-12-02T13:33:38Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+
+The default batch size is `3`, which means that a rebalance rebuilds up to 3 indexes at the same time.
+
+NOTE: Smart batching does not work if you have enabled File-Based Rebalance.


Let's double check this @Nischal1729 .

amithk · 2025-12-02T13:36:07Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+
+Users with Full Admin or Cluster Admin roles can xref:rest-api:rest-modify-index-batch-size.adoc[Modify Index Batch Size] using the REST API.
+
+You can use smart batching for one or more of the following purposes:


I am not sure if the information added by the bullet points here is consumable by the customers.

We can just say that "smart batching" optimises the batching of the indexes during index transfer done during rebalance. This speeds up the index rebalance process. That should be enough. The bullet points don't have any extra value.

amithk · 2025-12-02T13:39:37Z

modules/learn/pages/clusters-and-availability/rebalance-and-index-service.adoc

+
+Empty node batching is a behavior in rebalance planner that occurs when a newly added Index Service node has no indexes on it.
+Because the node is empty, there are no placement conflicts, no replica-placement constraints, and no query workload running on the node.
+This allows the planner to group multiple index movements together and schedule them as a single batch, rather than planning each index transfer individually.


We should refer to empty node batch size setting , and mention how it is different than regular batch size setting. We should also mention the defaults for both of these batch sized. May be add a note.

rao-shwe · 2025-12-03T06:15:21Z

@amithk, @Nischal1729, and @shivanshrustagi,

You complete the task of reviewing and adding your comments. I'll begin implementing your comments on Friday morning. I started working on a few comments last evening, but it can create a conflict with the version where you are adding comments. So, I'll continue implementing your comments after a round of your reviews are complete.

rao-shwe added 5 commits May 9, 2025 15:40

first commit

64d4e12

smart-batching

3c1cefb

added-file-based-rebalance-note

1756d91

Added all contents except flow diagrams

66e43d7

minor changes

41fd729

rao-shwe marked this pull request as ready for review November 11, 2025 14:23

rao-shwe requested review from Nischal1729 and amithk November 11, 2025 14:23

rao-shwe added 10 commits November 12, 2025 10:03

Minor grammar edit in smart batching section

4042ab0

rewrote and rearranged contents

05c37e1

Added corrected and formatted examples

3efae0d

A few arrangements

8734ca3

Interlinked a few sections for easy readability

4b0854a

Edited the last heading

5f642cf

testing images

e6225f0

test 2 of images

65bbe68

Added all flow diagrams

172325f

Additional info and updated images

1b2f326

rao-shwe requested a review from shivanshrustagi November 26, 2025 12:28

rao-shwe added 2 commits November 26, 2025 20:09

Added empty node batching info

8285e28

Minor edit

202b94d

rao-shwe added 3 commits November 27, 2025 10:43

Implemented technical review comments from Engg team

af869a3

A minor grammar edit

5d4d4f1

Moved again Swap Rebalance to a proer location

1d0e23b

rao-shwe added 3 commits November 28, 2025 13:07

Added the last 3 sub-sections and implemnted comments from gdocs

fe004f2

Fixed minor formatting issues and rewrote a Note

5b77a5b

Added links to connect sections with each other

5bb9aa7

rao-shwe added 2 commits November 28, 2025 16:10

Minor formatting fixes

2ac03d0

Fixed some bookmark and image issues

2c3507f

amithk reviewed Dec 1, 2025

View reviewed changes

Retry rebalance and impression of mode index movement

8d4de11

shivanshrustagi reviewed Dec 2, 2025

View reviewed changes

amithk reviewed Dec 2, 2025

View reviewed changes


		The Index Service maintains cluster-wide index definitions and metadata to <<#index-redistribution,redistribute indexes and replicas>> during rebalance operations.

		The rebalance operation evaluates each node's CPU, RAM, and disk bandwidth to minimize effects on database performance.


		A rebalance automatically redistributes indexes in the following situations:

		* Rebalance when you add an index node:


		* xref:learn:services-and-indexes/indexes/storage-modes.adoc#memory-optimized-index-storage[Memory Optimized GSI Storage], which stores most index structures in-memory and does not support file-based operations.

		Plasma, the storage engine for GSI, stores index data in shards.


		== Index Service

		The Index Service maintains cluster-wide index definitions and metadata to <<#index-redistribution,redistribute indexes and replicas>> during rebalance operations.


		The default batch size is `3`, which means that a rebalance rebuilds up to 3 indexes at the same time.

		NOTE: Smart batching does not work if you have enabled File-Based Rebalance.


		Users with Full Admin or Cluster Admin roles can xref:rest-api:rest-modify-index-batch-size.adoc[Modify Index Batch Size] using the REST API.

		You can use smart batching for one or more of the following purposes:

DOC-13173 Enhance documentation for Index Rebalance and Upgrade #3805

Are you sure you want to change the base?

DOC-13173 Enhance documentation for Index Rebalance and Upgrade #3805

Uh oh!

Conversation

rao-shwe commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Nischal1729 commented Nov 26, 2025

Uh oh!

NightWing1998 commented Nov 26, 2025

Uh oh!

rao-shwe commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rao-shwe commented Nov 28, 2025

Uh oh!

rao-shwe commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amithk commented Dec 1, 2025

Uh oh!

rao-shwe commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shivanshrustagi Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nischal1729 Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shivanshrustagi Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nischal1729 commented Dec 2, 2025

Uh oh!

amithk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rao-shwe commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

rao-shwe commented May 9, 2025 •

edited

Loading

rao-shwe commented Nov 27, 2025 •

edited

Loading

rao-shwe commented Nov 28, 2025 •

edited

Loading

rao-shwe commented Dec 1, 2025 •

edited

Loading

shivanshrustagi Dec 2, 2025 •

edited

Loading

Nischal1729 Dec 2, 2025 •

edited

Loading

shivanshrustagi Dec 2, 2025 •

edited

Loading