Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions docs/deployment.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,22 @@
# Supported MongoDB deployments

* Percona Link for MongoDB supports only Replica Set to Replica Set synchronization. The source and target replica sets can have different number of nodes.
{{pcsm.full_name}} supports the following deployment topologies:

* **Replica Set to Replica Set**: The source and target replica sets can have different numbers of nodes.
* **Sharded cluster to Sharded cluster**: The source and target sharded clusters can have different numbers of shards. This functionality is in tech preview stage. See [Sharding support in {{pcsm.full_name}}](sharding.md) for details.

Check warning on line 6 in docs/deployment.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/deployment.md#L6

[Google.WordList] Use 'capability' or 'feature' instead of 'functionality'.
Raw output
{"message": "[Google.WordList] Use 'capability' or 'feature' instead of 'functionality'.", "location": {"path": "docs/deployment.md", "range": {"start": {"line": 6, "column": 125}}}, "severity": "WARNING"}

## Version requirements

* You can synchronize Percona Server for MongoDB or MongoDB Community/Enterprise Advanced/Atlas within the same major versions - 6.0 to 6.0, 7.0 to 7.0, 8.0 to 8.0
* Percona Link for MongoDB is supported on both ARM64 and x86_64 architectures.
* Minimal supported MongoDB versions are: 6.0.17, 7.0.13, 8.0.0
* You can connect the following MongoDB deployments:

## Supported architectures

* {{pcsm.full_name}} is supported on both ARM64 and x86_64 architectures.

Check notice on line 15 in docs/deployment.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/deployment.md#L15

[Google.Passive] In general, use active voice instead of passive voice ('is supported').
Raw output
{"message": "[Google.Passive] In general, use active voice instead of passive voice ('is supported').", "location": {"path": "docs/deployment.md", "range": {"start": {"line": 15, "column": 22}}}, "severity": "INFO"}

## Supported MongoDB deployments

You can connect the following MongoDB deployments:

| Source | Target |
| --- | --- |
Expand Down
17 changes: 13 additions & 4 deletions docs/install/authentication.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,19 @@ When you [install PLM from repositories](repos.md), the environment file is crea

### Example environment file

```{.text .no-copy}
PLM_SOURCE_URI="mongodb://source:mys3cretpAssword@mysource1:27017,mysource2:27017,mysource3:27017/"
PLM_TARGET_URI="mongodb://target:tops3cr3t@mytarget1:27017,mytarget2:27017,mytarget3:27017/"
```
=== "Replica sets"

```{.text .no-copy}
PLM_SOURCE_URI="mongodb://source:mys3cretpAssword@mysource1:27017,mysource2:27017,mysource3:27017/"
PLM_TARGET_URI="mongodb://target:tops3cr3t@mytarget1:27017,mytarget2:27017,mytarget3:27017/"
```

=== "Sharded clusters"

```{.text .no-copy}
PCSM_SOURCE_URI="mongodb://source-user:password@mongos-source1:27017/admin"
PCSM_TARGET_URI="mongodb://target-user:password@mongos-target1:27017/admin"
```

### Passwords with special characters

Expand Down
24 changes: 13 additions & 11 deletions docs/install/usage.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
# Use {{plm.full_name}}
# Use {{pcsm.full_name}}

{{plm.full_name}} doesn't automatically start data replication after the startup. It has the `idle` status indicating that it is ready to accept requests.
{{pcsm.full_name}} doesn't automatically start data replication after startup. It has the `idle` status indicating that it is ready to accept requests.

Check notice on line 3 in docs/install/usage.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/install/usage.md#L3

[Google.Contractions] Use 'it's' instead of 'it is'.
Raw output
{"message": "[Google.Contractions] Use 'it's' instead of 'it is'.", "location": {"path": "docs/install/usage.md", "range": {"start": {"line": 3, "column": 121}}}, "severity": "INFO"}

You can interact with {{plm.full_name}} using the command-line interface or via the HTTP API. Read more about [PLM API](../api.md).
!!! tip "Understanding the workflow"

For an overview of how {{pcsm.short}} works and the replication workflow stages, see [How {{pcsm.full_name}} works](../intro.md).

You can interact with {{pcsm.full_name}} using the command-line interface or via the HTTP API. Read more about [{{pcsm.short}} HTTP API](../api.md).

## Before you start

Your target MongoDB cluster may be empty or contain data. PLM replicates data from the source to the target but doesn't manage the target's data. If the target already has the same data as the source, PLM overwrites it. However, if the target contains different data, PLM doesn't delete it during replication. This leads to inconsistencies between the source and target. To ensure consistency, manually delete any existing data from the target before starting replication.
Your target MongoDB cluster may be empty or contain data. {{pcsm.short}} replicates data from the source to the target but doesn't manage the target's data. If the target already has the same data as the source, {{pcsm.short}} overwrites it. However, if the target contains different data, {{pcsm.short}} doesn't delete it during replication. This leads to inconsistencies between the source and target. To ensure consistency, manually delete any existing data from the target before starting replication.

## Start the replication

Start the replication process between source and target clusters. PLM starts copying the data from the source to the target. First it does the initial sync by cloning the data and then applying all the changes that happened since the clone start.

Then it uses the [change streams :octicons-link-external-16:](https://www.mongodb.com/docs/manual/changeStreams/) to track the changes to your data on the source and replicate them to the target.
Start the replication process between source and target clusters. {{pcsm.short}} starts copying the data from the source to the target. First it does the initial sync by cloning the data and then applying all the changes that happened since the clone start. Then it uses [change streams :octicons-link-external-16:](https://www.mongodb.com/docs/manual/changeStreams/) to track changes on the source and replicate them to the target.

=== "Command line"

Expand Down Expand Up @@ -72,7 +74,7 @@

## Pause the replication

You can pause the replication at any moment. PLM stops the replication, saves the timestamp and enters the `paused` state. PLM uses the saved timestamp after you [resume the replication](#resume-the-replication).
You can pause the replication at any moment. {{pcsm.short}} stops the replication, saves the timestamp, and enters the `paused` state. {{pcsm.short}} uses the saved timestamp after you [resume the replication](#resume-the-replication).

=== "Command line"

Expand All @@ -90,7 +92,7 @@

## Resume the replication

Resume the replication. PLM changes the state to `running` and copies the changes that occurred to the data from the timestamp it saved when you paused the replication. Then it continues monitoring the data changes and replicating them real-time.
Resume the replication. {{pcsm.short}} changes the state to `running` and copies the changes that occurred from the timestamp it saved when you paused the replication. Then it continues monitoring data changes and replicating them in real time.

=== "Command line"

Expand Down Expand Up @@ -143,9 +145,9 @@
$ curl http://localhost:2242/status
```

# Finalize the replication
## Finalize the replication

When you no longer need / want to replicate data, finalize the replication. PLM stops replication, creates the required indexes on the target, and stops. This is a one-time operation. You cannot restart the replicaton after you finalized it. If you run the `start` command, PLM will start the replication anew, with the initial sync.
When you no longer need to replicate data, finalize the replication. {{pcsm.short}} stops replication, creates the required indexes on the target, and stops. This is a one-time operation. You cannot restart the replication after you finalized it. If you run the `start` command, {{pcsm.short}} will start the replication anew, with the initial sync.

Check notice on line 150 in docs/install/usage.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/install/usage.md#L150

[Google.Contractions] Use 'can't' instead of 'cannot'.
Raw output
{"message": "[Google.Contractions] Use 'can't' instead of 'cannot'.", "location": {"path": "docs/install/usage.md", "range": {"start": {"line": 150, "column": 193}}}, "severity": "INFO"}

Check warning on line 150 in docs/install/usage.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/install/usage.md#L150

[Google.Will] Avoid using 'will'.
Raw output
{"message": "[Google.Will] Avoid using 'will'.", "location": {"path": "docs/install/usage.md", "range": {"start": {"line": 150, "column": 295}}}, "severity": "WARNING"}

=== "Command line"

Expand Down
86 changes: 63 additions & 23 deletions docs/intro.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,89 @@
# How {{plm.full_name}} works
# How {{pcsm.full_name}} works

{{plm.full_name}} (PLM) is a binary process that replicates data between MongoDB deployments in real time until you manually finalize it. You can also make a one-time data migration from the source to the target with zero downtime.
{{pcsm.full_name}} is a binary process that replicates data between MongoDB deployments in real time until you manually finalize it. You can also make a one-time data migration from the source to the target with zero downtime.

You operate with {{plm.full_name}} using the [set of commands](plm-commands.md) or [API calls](api.md). Depending on the request it receives, {{plm.full_name}} has several states as shown in the following diagram:
You operate with {{pcsm.full_name}} using the [set of commands](pcsm-commands.md) or [API calls](api.md). Depending on the request it receives, {{pcsm.full_name}} has several states as shown in the following diagram:

![PLM states](_images/state-transition-flow.jpg)
![PCSM states](_images/state-transition-flow.jpg)

Check notice on line 7 in docs/intro.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/intro.md#L7

[Google.Acronyms] Spell out 'PCSM', if it's unfamiliar to the audience.
Raw output
{"message": "[Google.Acronyms] Spell out 'PCSM', if it's unfamiliar to the audience.", "location": {"path": "docs/intro.md", "range": {"start": {"line": 7, "column": 3}}}, "severity": "INFO"}

* **Idle**: PLM is up and running but not migrating data
* **Running**: PLM is replicating data from the source to the target. PLM enters the running state when you start and resume the replication
* **Paused**: PLM is not running and data is not replicated
* **Finalizing**: PLM stops the replication and is doing final checks, creates indexes
* **Idle**: {{pcsm.short}} is up and running but not migrating data
* **Running**: {{pcsm.short}} is replicating data from the source to the target. {{pcsm.short}} enters the running state when you start and resume the replication
* **Paused**: {{pcsm.short}} is not running and data is not replicated

Check notice on line 11 in docs/intro.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/intro.md#L11

[Google.Contractions] Use 'isn't' instead of 'is not'.
Raw output
{"message": "[Google.Contractions] Use 'isn't' instead of 'is not'.", "location": {"path": "docs/intro.md", "range": {"start": {"line": 11, "column": 30}}}, "severity": "INFO"}

Check notice on line 11 in docs/intro.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/intro.md#L11

[Google.Contractions] Use 'isn't' instead of 'is not'.
Raw output
{"message": "[Google.Contractions] Use 'isn't' instead of 'is not'.", "location": {"path": "docs/intro.md", "range": {"start": {"line": 11, "column": 54}}}, "severity": "INFO"}
* **Finalizing**: {{pcsm.short}} stops the replication and is doing final checks, creates indexes
* **Finalized**: all checks are complete, data replication is stopped
* **Failed**: PLM encountered an error
* **Failed**: {{pcsm.short}} encountered an error

## Usage scenario
## Replication workflows

Now, let's use the data migration from MongoDB Atlas to Percona Server for MongoDB as an example to understand how PLM works.
The workflow for {{pcsm.short}} depends on your MongoDB deployment topology. Select the tab below that matches your setup:

You run a MongoDB Atlas 8.0.8 deployed as a replica set. You need to migrate to Percona Server for MongoDB 8.0.8-3, also a replica set. You have a strict requirement to migrate with zero downtime; therefore, using logical backups with [Percona Backup for MongoDB :octicons-link-external-16:](https://docs.percona.com/percona-backup-mongodb/features/logical.html) is a no-go.
=== "Replica Sets"

A solution is to use Percona Link for MongoDB. MongoDB Atlas is your source. An empty Percona Server for MongoDB replica set is your target. Data migration is a resource-intensive task. Therefore, we recommend installing PLM closest to the target to reduce the network lag as much as possible.
### Usage scenario

Create users for PLM in both MongoDB deployments. Start and connect PLM to your source and target using these user credentials. Now you are ready to start the migration.
Let's use a data migration from MongoDB Atlas to Percona Server for MongoDB as an example to understand how {{pcsm.short}} works with replica sets.

To start the migration, call the `start` command. PLM starts copying the data from the source to the target. First it does the initial sync by cloning the data and then applying all the changes that happened since the clone start.
You run a MongoDB Atlas 8.0.8 deployed as a replica set. You need to migrate to Percona Server for MongoDB 8.0.8-3, also a replica set. You have a strict requirement to migrate with zero downtime; therefore, using logical backups with [Percona Backup for MongoDB :octicons-link-external-16:](https://docs.percona.com/percona-backup-mongodb/features/logical.html) is not an option.

After the initial data sync, PLM monitors changes in the source and replicates them to the target at runtime. You don't have to stop your source deployment, it operates as usual, accepting client requests. PLM uses [change streams :octicons-link-external-16:](https://www.mongodb.com/docs/manual/changeStreams/) to track the changes to your data and replicate them to the target.
A solution is to use {{pcsm.full_name}}. MongoDB Atlas is your source. An empty Percona Server for MongoDB replica set is your target. Data migration is a resource-intensive task. Therefore, we recommend installing {{pcsm.short}} on a dedicated host closest to the target to reduce the network lag as much as possible.

You can `pause` the replication and `resume` it later. When paused, PLM saves the timestamp when it stops the replication. After you resume PLM, it copies the changes from the saved timestamp and continues real-time replication.
### Workflow steps

You can track the migration status in logs and using the `status` command. When the data migration is complete, call the `finalize` command. This makes PLM finalize the replication, create the required indexes on the target, and stop. Note that finalizing is a one-time operation. If you try to start PLM again, it will start data copy anew.
1. **Set up authentication**: Create users for {{pcsm.short}} in both MongoDB deployments. Start and connect {{pcsm.short}} to your source and target using these user credentials. See [Configure authentication in MongoDB](install/authentication.md) for details.

Afterwards, you will only need to switch your clients to connect to Percona Server for MongoDB.
2. **Start the migration**: Call the `start` command. {{pcsm.short}} starts copying the data from the source to the target. First it does the initial sync by cloning the data and then applying all the changes that happened since the clone start. See [Start the replication](install/usage.md#start-the-replication) for command details.

3. **Real-time replication**: After the initial data sync, {{pcsm.short}} monitors changes in the source and replicates them to the target at runtime. You don't have to stop your source deployment—it operates as usual, accepting client requests. {{pcsm.short}} uses [change streams :octicons-link-external-16:](https://www.mongodb.com/docs/manual/changeStreams/) to track the changes to your data and replicate them to the target.

4. **Control replication**: You can `pause` the replication and `resume` it later. When paused, {{pcsm.short}} saves the timestamp when it stops the replication. After you resume {{pcsm.short}}, it copies the changes from the saved timestamp and continues real-time replication. See [Pause the replication](install/usage.md#pause-the-replication) and [Resume the replication](install/usage.md#resume-the-replication) for command details.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it copies the changes from the saved timestamp and continues real-time replication.

It will not copy anything after you resume, but rather PCSM starts watching change stream events from the moment the pause happened.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


5. **Monitor progress**: Track the migration status in logs and using the `status` command. See [Check the replication status](install/usage.md#check-the-replication-status) for details.

6. **Finalize**: When the data migration is complete, call the `finalize` command. This makes {{pcsm.short}} finalize the replication, create the required indexes on the target, and stop. Note that finalizing is a one-time operation. If you try to start {{pcsm.short}} again, it will start data copy anew. See [Finalize the replication](install/usage.md#finalize-the-replication) for command details.

7. **Cutover**: Switch your clients to connect to Percona Server for MongoDB.

For detailed instructions, see [Use {{pcsm.full_name}}](install/usage.md).

=== "Sharded Clusters (Tech Preview)"

Check notice on line 48 in docs/intro.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/intro.md#L48

[Google.Parens] Use parentheses judiciously.
Raw output
{"message": "[Google.Parens] Use parentheses judiciously.", "location": {"path": "docs/intro.md", "range": {"start": {"line": 48, "column": 23}}}, "severity": "INFO"}

### Usage scenario

Let's use a data migration between two sharded MongoDB clusters as an example to understand how {{pcsm.short}} works with sharded clusters.

For example, you run a MongoDB Enterprise Advanced 8.0 sharded cluster with 3 shards as your source. You need to migrate to a self-hosted Percona Server for MongoDB 8.0 sharded cluster with 5 shards as your target. You need zero-downtime migration and cannot afford to disable the balancer on either cluster, which makes traditional migration methods challenging.

A solution is to use {{pcsm.full_name}}. Since {{pcsm.short}} connects to `mongos` instances, the number of shards on source and target can differ. Install {{pcsm.short}} on a dedicated host closer to the target cluster to minimize network latency.

### Workflow steps

1. **Set up authentication**: Create users for {{pcsm.short}} in both MongoDB deployments. Configure connection strings using `mongos` hostname and port for both source and target clusters. See [Configure authentication in MongoDB](install/authentication.md) for details.

2. **Start the migration**: Call the `start` command. You don't have to disable the balancer on the target. Before starting the initial sync, {{pcsm.short}} checks data on the source cluster and reports it on the destination cluster. This way the target cluster knows what collections are sharded. Then {{pcsm.short}} starts copying all data from the source to the target. First it does the initial sync by cloning the data and then applying all the changes that happened since the clone start. See [Start the replication](install/usage.md#start-the-replication) for command details.

3. **Real-time replication**: During the replication stage, {{pcsm.short}} captures change stream events from the source cluster through `mongos` and applies them to the target cluster, ensuring real-time synchronization of data changes. The target cluster's balancer handles chunk distribution. For details about sharding-specific behavior, see [Sharding behavior](sharding.md#sharding-specific-behavior).

4. **Control replication**: You can `pause` the replication and `resume` it later, just like with replica sets. When paused, {{pcsm.short}} saves the timestamp when it stops the replication. See [Pause the replication](install/usage.md#pause-the-replication) and [Resume the replication](install/usage.md#resume-the-replication) for command details.

5. **Monitor progress**: Track the migration status in logs and using the `status` command. See [Check the replication status](install/usage.md#check-the-replication-status) for details.

6. **Finalize**: When the data migration is complete and you no longer need to run clusters in sync, call the `finalize` command to complete the migration. This makes {{pcsm.short}} finalize the replication, create the required indexes on the target, and stop. Note that finalizing is a one-time operation. If you try to start {{pcsm.short}} again, it will start data copy anew. See [Finalize the replication](install/usage.md#finalize-the-replication) for command details.

7. **Cutover**: Switch your clients to connect to the target Percona Server for MongoDB cluster.

For detailed information about sharded cluster replication, see [Sharding support in {{pcsm.full_name}}](sharding.md).

## Filtered replication

You can replicate the whole dataset or only a specific subset of data, which is a filtered replication. You can use filtered replication for various use cases, such as:
You can replicate the whole dataset or only a specific subset of data, which is a filtered replication. Filtered replication works for both replica sets and sharded clusters. You can use filtered replication for various use cases, such as:

* Spin up a new development environment with a specific subset of data instead of the whole dataset.
* Spin up a new development environment with a specific subset of data instead of the whole dataset.
* Optimize cloud storage costs for hybrid environments where your target MongoDB deployment runs in the cloud.

Specify what namespaces - databases and collections - to include and/or exclude from the replication when you start it.
Specify what namespacesdatabases and collectionsto include and/or exclude from the replication when you start it. See [Start the filtered replication](install/usage.md#start-the-filtered-replication) for details.

## Next steps

Ready to try out PLM?
Ready to try out {{pcsm.short}}?

[Quickstart](installation.md){.md-button}
Loading