-
Notifications
You must be signed in to change notification settings - Fork 46
docs: add WAL documentation and dedicated WAL volume configuration #1880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+220
−2
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
41 changes: 41 additions & 0 deletions
41
docs/user-guide/deployments-administration/wal/local-wal.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
--- | ||
keywords: [Configuration, Local WAL, GreptimeDB Datanode, GreptimeDB] | ||
description: This section describes how to configure the Local WAL for GreptimeDB Datanode component. | ||
--- | ||
# Local WAL | ||
|
||
This section describes how to configure the local WAL for GreptimeDB Datanode component. | ||
|
||
```toml | ||
[wal] | ||
provider = "raft_engine" | ||
file_size = "128MB" | ||
purge_threshold = "1GB" | ||
purge_interval = "1m" | ||
read_batch_size = 128 | ||
sync_write = false | ||
``` | ||
|
||
## Options | ||
|
||
If you are using Helm Chart to deploy GreptimeDB, you can refer to [Common Helm Chart Configurations](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) to learn how to configure the Datanode by injecting configuration files. | ||
|
||
| Configuration Option | Description | Default Value | | ||
| -------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------------- | | ||
| `provider` | The provider of the WAL. Options: `raft_engine` (local file system storage) or `kafka` (remote WAL storage in Kafka) | `"raft_engine"` | | ||
| `dir` | The directory where to write logs | `{data_home}/wal` | | ||
| `file_size` | The size of single WAL log file | `128MB` | | ||
| `purge_threshold` | The threshold of the WAL size to trigger purging | `1GB` | | ||
| `purge_interval` | The interval to trigger purging | `1m` | | ||
| `read_batch_size` | The read batch size | `128` | | ||
| `sync_write` | Whether to call fsync when writing every log | `false` | | ||
|
||
## Best practices | ||
|
||
### Using a separate High-Performance Volume for WAL | ||
It is beneficial to configure a separate volume for the WAL (Write-Ahead Log) directory when deploying GreptimeDB. This setup allows you to: | ||
|
||
- Leverage a high-performance disk—either a dedicated physical volume or one provisioned via a custom `StorageClass`. | ||
- Isolate WAL I/O from cache file access, reducing I/O contention and enhancing overall system performance. | ||
|
||
If you are using Helm Chart to deploy GreptimeDB, you can refer to [Common Helm Chart Configurations](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) to learn how to configure a dedicated WAL volume. |
48 changes: 48 additions & 0 deletions
48
docs/user-guide/deployments-administration/wal/overview.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
--- | ||
keywords: [WAL, Write-Ahead Logging, Local WAL, Remote WAL, GreptimeDB] | ||
description: This section describes the WAL (Write-Ahead Logging) in GreptimeDB, including the advantages and disadvantages of Local WAL and Remote WAL. | ||
--- | ||
# Overview | ||
|
||
The [Write-Ahead Logging](/contributor-guide/datanode/wal.md#introduction)(WAL) is a crucial component in GreptimeDB that persistently records every data modification to ensure no memory-cached data loss. GreptimeDB provides two WAL storage options: | ||
|
||
- **Local WAL**: Uses an embedded storage engine([raft-engine](https://github.com/tikv/raft-engine)) within the [Datanode](/user-guide/concepts/why-greptimedb.md). | ||
|
||
- **Remote WAL**: Uses [Apache Kafka](https://kafka.apache.org/) as the external(remote) WAL storage component. | ||
|
||
## Local WAL | ||
|
||
### Advantages | ||
|
||
- **Low latency**: The local WAL is stored within the same process as the Datanode, eliminating network overhead and providing low write latency. | ||
|
||
- **Easy to deploy**: Since the WAL is co-located with the Datanode, no additional components are required, simplifying deployment and operations. | ||
|
||
- **Zero RPO**: When deploying GreptimeDB in the cloud, you can configure persistent storage for WAL data using cloud storage services such as AWS EBS or GCP Persistent Disk. This ensures zero [Recovery Point Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective) (RPO), meaning no data loss, even in the event of system failure. | ||
|
||
### Disadvantages | ||
|
||
- **High RTO**: Because the WAL resides on the same node as the Datanode, the [Recovery Time Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) is relatively high. After a Datanode restarts, it must replay the WAL to restore the latest data, during which time the node remains unavailable. | ||
|
||
- **Single-Point Access Limitation**: The local WAL is tightly coupled with the Datanode process and only supports a single consumer, which limits features such as region hot standby and [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md). | ||
|
||
## Remote WAL | ||
|
||
### Advantages | ||
|
||
- **Low RTO**: By decoupling WAL from the Datanode, [Recovery Time Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) is minimized. If a Datanode crashes, the Metasrv can quickly trigger a [Region Failover](/user-guide/deployments-administration/manage-data/region-failover.md) to migrate affected regions to healthy nodes—without the need to replay WAL locally. | ||
|
||
- **Multi-Consumer Subscriptions**: Remote WAL supports multiple consumers subscribing to WAL logs simultaneously, enabling features such as region hot standby and [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md), thereby enhancing system availability and flexibility. | ||
|
||
### Disadvantages | ||
|
||
- **External dependencies**: Remote WAL relies on an external Kafka cluster, which increases the complexity of deployment, operation, and maintenance. | ||
|
||
- **Network overhead**: Since WAL data needs to be transmitted over the network, careful planning of cluster network bandwidth is required to ensure low latency and high throughput, especially under write-heavy workloads. | ||
|
||
|
||
## Next steps | ||
|
||
- To configure the Local WAL storage, please refer to [Local WAL](/user-guide/deployments-administration/wal/local-wal.md). | ||
|
||
- To learn more about the Remote WAL, please refer to [Remote WAL](/user-guide/deployments-administration/wal/remote-wal/quick-start.md). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
45 changes: 45 additions & 0 deletions
45
...gin-content-docs/current/user-guide/deployments-administration/wal/local-wal.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
--- | ||
keywords: [配置, 本地 WAL, GreptimeDB Datanode, GreptimeDB] | ||
description: 介绍如何配置 GreptimeDB 中的本地 WAL。 | ||
--- | ||
|
||
# 本地 WAL | ||
|
||
本节介绍如何配置 GreptimeDB Datanode 组件的本地 WAL。 | ||
|
||
```toml | ||
[wal] | ||
provider = "raft_engine" | ||
file_size = "128MB" | ||
purge_threshold = "1GB" | ||
purge_interval = "1m" | ||
read_batch_size = 128 | ||
sync_write = false | ||
``` | ||
|
||
## 选项 | ||
|
||
如果你使用 Helm Chart 部署 GreptimeDB,可以参考[常见 Helm Chart 配置项](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md)了解如何通过注入配置文件以配置 Datanode。 | ||
|
||
| 配置项 | 描述 | 默认值 | | ||
| ----------------- | ----------------------------------------------------------------------------------------------- | ----------------- | | ||
| `provider` | WAL 的提供者。可选项:`raft_engine`(本地文件系统存储)或 `kafka`(使用 Kafka 的远程 WAL 存储) | `"raft_engine"` | | ||
| `dir` | 日志写入目录 | `{data_home}/wal` | | ||
| `file_size` | 单个 WAL 日志文件的大小 | `128MB` | | ||
| `purge_threshold` | 触发清理的 WAL 总大小阈值 | `1GB` | | ||
| `purge_interval` | 触发清理的时间间隔 | `1m` | | ||
| `read_batch_size` | 读取批次大小 | `128` | | ||
| `sync_write` | 是否在每次写入日志时调用 fsync | `false` | | ||
|
||
## 最佳实践 | ||
|
||
### 使用单独的高性能卷作为 WAL 目录 | ||
在部署 GreptimeDB 时,配置单独的卷作为 WAL 目录具有显著优势。这样做可以: | ||
|
||
|
||
- 使用高性能磁盘——包括专用物理卷或自定义的高性能 `StorageClass`,以提升 WAL 的写入吞吐量。 | ||
- 隔离 WAL I/O 与缓存文件访问,降低 I/O 竞争,提升整体系统性能。 | ||
|
||
|
||
如果你使用 Helm Chart 部署 GreptimeDB,可以参考[常见 Helm Chart 配置项](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md)了解如何配置专用 WAL 卷。 | ||
|
51 changes: 51 additions & 0 deletions
51
...ugin-content-docs/current/user-guide/deployments-administration/wal/overview.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
--- | ||
keywords: [WAL, 预写日志, 本地 WAL, Remote WAL, GreptimeDB] | ||
description: 介绍 GreptimeDB 中的 WAL(预写日志),包括本地 WAL 和远程 WAL 的优缺点。 | ||
--- | ||
# 概述 | ||
|
||
[预写日志](/contributor-guide/datanode/wal.md#introduction)(WAL) 是 GreptimeDB 的关键组件之一,负责持久化记录每次数据修改操作,以确保内存中的数据在故障发生时不会丢失。GreptimeDB 支持两种 WAL 存储方案: | ||
|
||
|
||
- **本地 WAL**: 使用嵌入式存储引擎 [raft-engine](https://github.com/tikv/raft-engine) ,直接集成在 [Datanode](/user-guide/concepts/why-greptimedb.md) 服务中。 | ||
|
||
- **Remote WAL**: 使用 [Apache Kafka](https://kafka.apache.org/) 作为外部的 WAL 存储组件。 | ||
|
||
## 本地 WAL | ||
|
||
### 优点 | ||
|
||
- **低延迟**: 本地 WAL 运行于 Datanode 进程内,避免了网络传输开销,提供更低的写入延迟。 | ||
|
||
- **易于部署**: 由于 WAL 与 Datanode 紧耦合,无需引入额外组件,部署和运维更加简便。 | ||
|
||
- **零 RPO**: 在云环境中部署 GreptimeDB 时,可以结合云存储服务(如 AWS EBS 或 GCP Persistent Disk)将 WAL 数据持久化存储,从而实现零[恢复点目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective) (RPO),即使发生故障也不会丢失任何已写入的数据。 | ||
|
||
### 缺点 | ||
|
||
- **高 RTO**: 由于 WAL 与 Datanode 紧密耦合,[恢复时间目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) 相对较高。在 Datanode 重启后,必须重放 WAL 以恢复最新数据,在此期间节点保持不可用。 | ||
|
||
- **单点访问限制**: 本地 WAL 与 Datanode 进程紧密耦合,仅支持单个消费者访问,限制了区域热备份和 [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md) 等功能的实现。 | ||
|
||
## Remote WAL | ||
|
||
### 优点 | ||
|
||
- **低 RTO**: 通过将 WAL 与 Datanode 解耦,[恢复时间目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) 得以最小化。当 Datanode 崩溃时,Metasrv 会发起 [Region Failover](/user-guide/deployments-administration/manage-data/region-failover.md) ,将受影响 Region 迁移至健康节点,无需本地重放 WAL。 | ||
|
||
|
||
- **多消费者订阅**:Remote WAL 支持多个消费者同时订阅 WAL 日志,实现 Region 热备和 [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md) 等功能,提升系统的高可用性和灵活性。 | ||
|
||
|
||
### 缺点 | ||
|
||
- **外部依赖**: Remote WAL 依赖外部 Kafka 集群,增加了部署和运维复杂度。 | ||
|
||
- **网络开销**: WAL 数据需通过网络传输,需合理规划集群网络带宽,确保低延迟与高吞吐量,尤其在写入密集型负载下。 | ||
|
||
|
||
## 后续步骤 | ||
|
||
- 如需配置本地 WAL 存储,请参阅[本地 WAL](/user-guide/deployments-administration/wal/local-wal.md)。 | ||
|
||
- 想了解更多 Remote WAL 相关信息,请参阅 [Remote WAL](/user-guide/deployments-administration/wal/remote-wal/quick-start.md)。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.