-
Notifications
You must be signed in to change notification settings - Fork 2
PCSM_deployment_architecture #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
88e6ad4
680dd2f
d8ca190
ef50466
b727c1b
328e22d
3ae3881
733abe3
b3e2aaa
048fe53
1075536
284b9a3
4a760a2
62e8bd8
26c79cd
45b5a75
01e1808
7abaf96
41e8b0d
1ef28d0
145da2b
b7fdbd1
26939b1
e9513e0
37c1b63
b1be8cf
e4d3fc2
eac9c66
4081bbb
04f0f65
adacc80
5a37b2b
a0eb1e7
320edb9
e1f2de3
8159810
f37d275
bb9384c
ed3655d
626fb6b
b2264f4
b2dd7ee
6da9221
8a2f826
b2a1416
c344ab6
0fe4b3a
636ea42
005551d
cca3176
8d4f519
51e0eb3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,3 +16,4 @@ __pycache__/ | |
| # Allow | ||
|
|
||
| !styles/config/vocabularies/Percona/** | ||
| .DS_Store | ||
rasika-chivate marked this conversation as resolved.
Show resolved
Hide resolved
|
rasika-chivate marked this conversation as resolved.
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,101 @@ | ||
| # Deployment Architecture | ||
|
Check warning on line 1 in docs/architecture.md
|
||
|
|
||
| Percona ClusterSync for MongoDB (PCSM) is a middleware synchronization tool that connects source and target clusters. It reads the change stream from the source cluster and applies those changes to the target cluster. | ||
|
Check notice on line 3 in docs/architecture.md
|
||
|
|
||
| Since PCSM operates as a standalone binary process, its placement within your infrastructure can significantly impact performance, particularly in terms of network latency, which can affect replication time. | ||
|
Check notice on line 5 in docs/architecture.md
|
||
|
|
||
| You can deploy PCSM using one of three different architectures. | ||
|
Check notice on line 7 in docs/architecture.md
|
||
|
|
||
|
|
||
| ## Dedicated host (intermediary) | ||
|
Check notice on line 10 in docs/architecture.md
|
||
|
|
||
| The PCSM process runs on a dedicated machine, which can be a virtual machine, container, or physical server. This machine is logically placed between the source and target clusters. Since data migration is resource-intensive, it is recommended to install PCSM as close to the target cluster as possible to reduce network latency. | ||
|
Check notice on line 12 in docs/architecture.md
|
||
|
|
||
| !!! info "Recommended use" | ||
| This deployment architecture is recommended for production environments as it provides the highest level of isolation and reliability for critical data synchronization. | ||
|
Check notice on line 15 in docs/architecture.md
|
||
|
|
||
|
|
||
|  | ||
|
|
||
|
|
||
| | Pros | Cons | | ||
| |------|------| | ||
| | **Resource isolation**: PCSM has its own dedicated CPU and RAM, ensuring it does not **starve** the source or target databases.| **Network latency**: Adds an extra network hop (Source → PCSM → Target), introducing some latency, which is typically negligible in modern, low-latency networks. | | ||
|
Check notice on line 23 in docs/architecture.md
|
||
| | **Stability**: In the event that PCSM crashes or becomes unresponsive, both the source and target clusters will remain completely unaffected.| **Infrastructure cost**: Requires provisioning and maintaining an additional compute resource for the PCSM service. | | ||
|
Check notice on line 24 in docs/architecture.md
|
||
| | **Scalability**: The PCSM host can be vertically scaled (for example, adding memory for large in-memory buffers) without modifying database hardware. | | | ||
|
Check notice on line 25 in docs/architecture.md
|
||
|
|
||
|
|
||
| ## Target node (co-located) | ||
|
Check notice on line 28 in docs/architecture.md
|
||
|
|
||
| The PCSM process runs directly on a primary node in the target cluster. | ||
|
Check notice on line 30 in docs/architecture.md
|
||
|
|
||
|
|
||
| !!! info "Recommended use" | ||
| This deployment architecture is recommended for **one-way migrations** where the target cluster is currently empty and not serving application traffic. | ||
|
Check notice on line 34 in docs/architecture.md
|
||
|
|
||
|
|
||
|  | ||
|
|
||
| | Pros | Cons | | ||
| |------|------| | ||
| | **Efficient writes:** Write operations are performed directly on the target, which helps to minimize write latency. | **Limited vertical scalability**: Scaling PCSM on this co-located node requires scaling the entire database server (CPU, RAM, and I/O), which can be unnecessary and costly and may increase resource contention with the target database. | | ||
|
Check notice on line 41 in docs/architecture.md
|
||
| | **Minimizes impact on the production source cluster:** Resource contention, such as CPU and RAM spikes, affects the target cluster while leaving the production source cluster unaffected.| | ||
|
|
||
|
|
||
| ## Source node deployment (co-located) | ||
|
Check notice on line 45 in docs/architecture.md
|
||
|
|
||
| The PCSM process executes directly on a primary node in the source cluster. | ||
|
Check notice on line 47 in docs/architecture.md
|
||
|
|
||
|
|
||
|  | ||
|
|
||
|
|
||
| !!! warning "Recommended use" | ||
| This deployment architecture is recommended only for low-traffic source clusters or when the source node has significant available capacity. | ||
|
Check notice on line 54 in docs/architecture.md
|
||
|
|
||
|
|
||
| | Pros | Cons | | ||
| |------|------| | ||
| | **Lowest read latency**: PCSM directly reads local changes from the filesystem or loopback network, which minimizes read overhead.| **Resource contention**: PCSM competes with the running source database for CPU, RAM, and network I/O resources. During heavy synchronization phases, such as the initial sync, this competition can degrade the performance of the production source cluster.| | ||
|
Check notice on line 59 in docs/architecture.md
|
||
| | **Simplicity**: There is no need to provision extra hardware.| **Failure Risk**: If PCSM uses too much memory or causes an Operation system level fault, it may crash the source node.| | ||
|
Check notice on line 60 in docs/architecture.md
|
||
rasika-chivate marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| ## Choosing the right deployment architecture | ||
|
|
||
| | Deployment model | Recommended for | Risk level | | ||
| | ---------------- | --------------------------------------- | ---------- | | ||
| | Dedicated host | Production workloads and large datasets | Low | | ||
| | Target node | One-way migrations to idle targets | Medium | | ||
| | Source node | Low-traffic or non-critical sources | High | | ||
|
|
||
|
|
||
| ## Use cases | ||
|
|
||
|
|
||
| === ":material-server: Dedicated host" | ||
| You can use the dedicated host (intermediary) architecture in the following scenarios: | ||
|
Check notice on line 76 in docs/architecture.md
|
||
|
|
||
| - When migrating production data where stability, isolation, and predictable performance are critical. | ||
|
|
||
| - Suitable for clusters with large data sizes or sustained high write throughput, where PCSM requires significant CPU and memory resources. | ||
|
|
||
| - When neither the source nor the target database nodes can tolerate additional CPU, memory, or I/O load from replication processes. | ||
|
|
||
|
|
||
| === ":material-database-arrow-right: Target node (co-located)" | ||
|
Check notice on line 85 in docs/architecture.md
|
||
| You can use the target node (co-located) deployment in the following scenarios: | ||
|
Check notice on line 86 in docs/architecture.md
|
||
|
|
||
| - When migrating data into a newly provisioned target cluster that is not serving application traffic yet. | ||
|
|
||
| - Useful when the target cluster is used for validation, testing, or acceptance before being promoted to production. | ||
|
|
||
| - Appropriate when minimizing write latency on the target cluster is more important than isolating replication workloads. | ||
|
|
||
| === ":material-database-arrow-left: Source node (co-located)" | ||
|
Check notice on line 94 in docs/architecture.md
|
||
| You can use the source node (co-located) deployment in the following scenarios: | ||
|
Check notice on line 95 in docs/architecture.md
|
||
|
|
||
| - When the source cluster handles minimal application traffic and has sufficient spare CPU, memory, and I/O capacity to accommodate PCSM without performance degradation. | ||
|
|
||
| - Suitable for non-production environments where temporary performance impact on the source cluster is acceptable. | ||
|
|
||
| - Useful when provisioning additional compute resources (such as a dedicated PCSM host) is not possible. | ||
Uh oh!
There was an error while loading. Please reload this page.