-
Notifications
You must be signed in to change notification settings - Fork 4
PCSM-148 Sharding support #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
3b425b8
0f2c8f6
72bcac8
a8bd87b
5c7feb9
05de0f9
4843e73
52de4a5
450a106
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,49 +1,89 @@ | ||
| # How {{plm.full_name}} works | ||
| # How {{pcsm.full_name}} works | ||
|
|
||
| {{plm.full_name}} (PLM) is a binary process that replicates data between MongoDB deployments in real time until you manually finalize it. You can also make a one-time data migration from the source to the target with zero downtime. | ||
| {{pcsm.full_name}} is a binary process that replicates data between MongoDB deployments in real time until you manually finalize it. You can also make a one-time data migration from the source to the target with zero downtime. | ||
|
|
||
| You operate with {{plm.full_name}} using the [set of commands](plm-commands.md) or [API calls](api.md). Depending on the request it receives, {{plm.full_name}} has several states as shown in the following diagram: | ||
| You operate with {{pcsm.full_name}} using the [set of commands](pcsm-commands.md) or [API calls](api.md). Depending on the request it receives, {{pcsm.full_name}} has several states as shown in the following diagram: | ||
|
|
||
|  | ||
|  | ||
|
Check notice on line 7 in docs/intro.md
|
||
|
|
||
| * **Idle**: PLM is up and running but not migrating data | ||
| * **Running**: PLM is replicating data from the source to the target. PLM enters the running state when you start and resume the replication | ||
| * **Paused**: PLM is not running and data is not replicated | ||
| * **Finalizing**: PLM stops the replication and is doing final checks, creates indexes | ||
| * **Idle**: {{pcsm.short}} is up and running but not migrating data | ||
| * **Running**: {{pcsm.short}} is replicating data from the source to the target. {{pcsm.short}} enters the running state when you start and resume the replication | ||
| * **Paused**: {{pcsm.short}} is not running and data is not replicated | ||
|
Check notice on line 11 in docs/intro.md
|
||
| * **Finalizing**: {{pcsm.short}} stops the replication and is doing final checks, creates indexes | ||
| * **Finalized**: all checks are complete, data replication is stopped | ||
| * **Failed**: PLM encountered an error | ||
| * **Failed**: {{pcsm.short}} encountered an error | ||
|
|
||
| ## Usage scenario | ||
| ## Replication workflows | ||
|
|
||
| Now, let's use the data migration from MongoDB Atlas to Percona Server for MongoDB as an example to understand how PLM works. | ||
| The workflow for {{pcsm.short}} depends on your MongoDB deployment topology. Select the tab below that matches your setup: | ||
|
|
||
| You run a MongoDB Atlas 8.0.8 deployed as a replica set. You need to migrate to Percona Server for MongoDB 8.0.8-3, also a replica set. You have a strict requirement to migrate with zero downtime; therefore, using logical backups with [Percona Backup for MongoDB :octicons-link-external-16:](https://docs.percona.com/percona-backup-mongodb/features/logical.html) is a no-go. | ||
| === "Replica Sets" | ||
|
|
||
| A solution is to use Percona Link for MongoDB. MongoDB Atlas is your source. An empty Percona Server for MongoDB replica set is your target. Data migration is a resource-intensive task. Therefore, we recommend installing PLM closest to the target to reduce the network lag as much as possible. | ||
| ### Usage scenario | ||
|
|
||
| Create users for PLM in both MongoDB deployments. Start and connect PLM to your source and target using these user credentials. Now you are ready to start the migration. | ||
| Let's use a data migration from MongoDB Atlas to Percona Server for MongoDB as an example to understand how {{pcsm.short}} works with replica sets. | ||
|
|
||
| To start the migration, call the `start` command. PLM starts copying the data from the source to the target. First it does the initial sync by cloning the data and then applying all the changes that happened since the clone start. | ||
| You run a MongoDB Atlas 8.0.8 deployed as a replica set. You need to migrate to Percona Server for MongoDB 8.0.8-3, also a replica set. You have a strict requirement to migrate with zero downtime; therefore, using logical backups with [Percona Backup for MongoDB :octicons-link-external-16:](https://docs.percona.com/percona-backup-mongodb/features/logical.html) is not an option. | ||
|
|
||
| After the initial data sync, PLM monitors changes in the source and replicates them to the target at runtime. You don't have to stop your source deployment, it operates as usual, accepting client requests. PLM uses [change streams :octicons-link-external-16:](https://www.mongodb.com/docs/manual/changeStreams/) to track the changes to your data and replicate them to the target. | ||
| A solution is to use {{pcsm.full_name}}. MongoDB Atlas is your source. An empty Percona Server for MongoDB replica set is your target. Data migration is a resource-intensive task. Therefore, we recommend installing {{pcsm.short}} on a dedicated host closest to the target to reduce the network lag as much as possible. | ||
|
|
||
| You can `pause` the replication and `resume` it later. When paused, PLM saves the timestamp when it stops the replication. After you resume PLM, it copies the changes from the saved timestamp and continues real-time replication. | ||
| ### Workflow steps | ||
|
|
||
| You can track the migration status in logs and using the `status` command. When the data migration is complete, call the `finalize` command. This makes PLM finalize the replication, create the required indexes on the target, and stop. Note that finalizing is a one-time operation. If you try to start PLM again, it will start data copy anew. | ||
| 1. **Set up authentication**: Create users for {{pcsm.short}} in both MongoDB deployments. Start and connect {{pcsm.short}} to your source and target using these user credentials. See [Configure authentication in MongoDB](install/authentication.md) for details. | ||
|
|
||
| Afterwards, you will only need to switch your clients to connect to Percona Server for MongoDB. | ||
| 2. **Start the migration**: Call the `start` command. {{pcsm.short}} starts copying the data from the source to the target. First it does the initial sync by cloning the data and then applying all the changes that happened since the clone start. See [Start the replication](install/usage.md#start-the-replication) for command details. | ||
|
|
||
| 3. **Real-time replication**: After the initial data sync, {{pcsm.short}} monitors changes in the source and replicates them to the target at runtime. You don't have to stop your source deployment—it operates as usual, accepting client requests. {{pcsm.short}} uses [change streams :octicons-link-external-16:](https://www.mongodb.com/docs/manual/changeStreams/) to track the changes to your data and replicate them to the target. | ||
|
|
||
| 4. **Control replication**: You can `pause` the replication and `resume` it later. When paused, {{pcsm.short}} saves the timestamp when it stops the replication. After you resume {{pcsm.short}}, it copies the changes from the saved timestamp and continues real-time replication. See [Pause the replication](install/usage.md#pause-the-replication) and [Resume the replication](install/usage.md#resume-the-replication) for command details. | ||
|
||
|
|
||
| 5. **Monitor progress**: Track the migration status in logs and using the `status` command. See [Check the replication status](install/usage.md#check-the-replication-status) for details. | ||
|
|
||
| 6. **Finalize**: When the data migration is complete, call the `finalize` command. This makes {{pcsm.short}} finalize the replication, create the required indexes on the target, and stop. Note that finalizing is a one-time operation. If you try to start {{pcsm.short}} again, it will start data copy anew. See [Finalize the replication](install/usage.md#finalize-the-replication) for command details. | ||
|
|
||
| 7. **Cutover**: Switch your clients to connect to Percona Server for MongoDB. | ||
|
|
||
| For detailed instructions, see [Use {{pcsm.full_name}}](install/usage.md). | ||
|
|
||
| === "Sharded Clusters (Tech Preview)" | ||
|
Check notice on line 48 in docs/intro.md
|
||
|
|
||
| ### Usage scenario | ||
|
|
||
| Let's use a data migration between two sharded MongoDB clusters as an example to understand how {{pcsm.short}} works with sharded clusters. | ||
|
|
||
| For example, you run a MongoDB Enterprise Advanced 8.0 sharded cluster with 3 shards as your source. You need to migrate to a self-hosted Percona Server for MongoDB 8.0 sharded cluster with 5 shards as your target. You need zero-downtime migration and cannot afford to disable the balancer on either cluster, which makes traditional migration methods challenging. | ||
|
|
||
| A solution is to use {{pcsm.full_name}}. Since {{pcsm.short}} connects to `mongos` instances, the number of shards on source and target can differ. Install {{pcsm.short}} on a dedicated host closer to the target cluster to minimize network latency. | ||
|
|
||
| ### Workflow steps | ||
|
|
||
| 1. **Set up authentication**: Create users for {{pcsm.short}} in both MongoDB deployments. Configure connection strings using `mongos` hostname and port for both source and target clusters. See [Configure authentication in MongoDB](install/authentication.md) for details. | ||
|
|
||
| 2. **Start the migration**: Call the `start` command. You don't have to disable the balancer on the target. Before starting the initial sync, {{pcsm.short}} checks data on the source cluster and reports it on the destination cluster. This way the target cluster knows what collections are sharded. Then {{pcsm.short}} starts copying all data from the source to the target. First it does the initial sync by cloning the data and then applying all the changes that happened since the clone start. See [Start the replication](install/usage.md#start-the-replication) for command details. | ||
|
|
||
| 3. **Real-time replication**: During the replication stage, {{pcsm.short}} captures change stream events from the source cluster through `mongos` and applies them to the target cluster, ensuring real-time synchronization of data changes. The target cluster's balancer handles chunk distribution. For details about sharding-specific behavior, see [Sharding behavior](sharding.md#sharding-specific-behavior). | ||
nastena1606 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| 4. **Control replication**: You can `pause` the replication and `resume` it later, just like with replica sets. When paused, {{pcsm.short}} saves the timestamp when it stops the replication. See [Pause the replication](install/usage.md#pause-the-replication) and [Resume the replication](install/usage.md#resume-the-replication) for command details. | ||
|
|
||
| 5. **Monitor progress**: Track the migration status in logs and using the `status` command. See [Check the replication status](install/usage.md#check-the-replication-status) for details. | ||
|
|
||
| 6. **Finalize**: When the data migration is complete and you no longer need to run clusters in sync, call the `finalize` command to complete the migration. This makes {{pcsm.short}} finalize the replication, create the required indexes on the target, and stop. Note that finalizing is a one-time operation. If you try to start {{pcsm.short}} again, it will start data copy anew. See [Finalize the replication](install/usage.md#finalize-the-replication) for command details. | ||
|
|
||
| 7. **Cutover**: Switch your clients to connect to the target Percona Server for MongoDB cluster. | ||
|
|
||
| For detailed information about sharded cluster replication, see [Sharding support in {{pcsm.full_name}}](sharding.md). | ||
|
|
||
| ## Filtered replication | ||
|
|
||
| You can replicate the whole dataset or only a specific subset of data, which is a filtered replication. You can use filtered replication for various use cases, such as: | ||
| You can replicate the whole dataset or only a specific subset of data, which is a filtered replication. Filtered replication works for both replica sets and sharded clusters. You can use filtered replication for various use cases, such as: | ||
|
|
||
| * Spin up a new development environment with a specific subset of data instead of the whole dataset. | ||
| * Spin up a new development environment with a specific subset of data instead of the whole dataset. | ||
| * Optimize cloud storage costs for hybrid environments where your target MongoDB deployment runs in the cloud. | ||
|
|
||
| Specify what namespaces - databases and collections - to include and/or exclude from the replication when you start it. | ||
| Specify what namespaces—databases and collections—to include and/or exclude from the replication when you start it. See [Start the filtered replication](install/usage.md#start-the-filtered-replication) for details. | ||
|
|
||
| ## Next steps | ||
|
|
||
| Ready to try out PLM? | ||
| Ready to try out {{pcsm.short}}? | ||
|
|
||
| [Quickstart](installation.md){.md-button} | ||
Uh oh!
There was an error while loading. Please reload this page.