Skip to content

Commit c0b6b77

Browse files
committed
docs: add WAL documentation and dedicated WAL volume configuration
Signed-off-by: WenyXu <[email protected]>
1 parent aace4ae commit c0b6b77

File tree

7 files changed

+222
-0
lines changed

7 files changed

+222
-0
lines changed

docs/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -529,4 +529,19 @@ meta:
529529
enableRegionFailover: true
530530
configData: |
531531
allow_region_failover_on_local_wal = true
532+
```
533+
534+
### Dedicated WAL Volume
535+
536+
Configuring a dedicated WAL volume allows you to use a separate disk with a custom `StorageClass` for the WAL directory when deploying a GreptimeDB Datanode.
537+
538+
```yaml
539+
dedicatedWAL:
540+
enabled: true
541+
raftEngine:
542+
fs:
543+
storageClassName: io2 # Use aws ebs io2 storage class for WAL for better performance.
544+
name: wal
545+
storageSize: 20Gi
546+
mountPath: /wal
532547
```
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
keywords: [Configuration, Local WAL, GreptimeDB Datanode, GreptimeDB]
3+
description: This section describes how to configure the Local WAL for GreptimeDB Datanode component.
4+
---
5+
# Configuration
6+
7+
This section describes how to configure the Local WAL for GreptimeDB Datanode component.
8+
9+
```toml
10+
[wal]
11+
provider = "raft_engine"
12+
file_size = "128MB"
13+
purge_threshold = "1GB"
14+
purge_interval = "1m"
15+
read_batch_size = 128
16+
sync_write = false
17+
```
18+
19+
## Options
20+
21+
If you are using Helm Chart to deploy GreptimeDB, you can refer to [Common Helm Chart Configurations](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) to learn how to configure the Datanode by injecting configuration files.
22+
23+
| Configuration Option | Description | Default Value | Provider |
24+
| -------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------------- | ------------- |
25+
| `provider` | The provider of the WAL. Options: `raft_engine` (local file system storage) or `kafka` (remote WAL storage in Kafka) | `"raft_engine"` | All |
26+
| `dir` | The directory where to write logs | `{data_home}/wal` | `raft_engine` |
27+
| `file_size` | The maximum size of the WAL log file | `128MB` | `raft_engine` |
28+
| `purge_threshold` | The threshold of the WAL size to trigger purging | `1GB` | `raft_engine` |
29+
| `purge_interval` | The interval to trigger purging | `1m` | `raft_engine` |
30+
| `read_batch_size` | The read batch size | `128` | `raft_engine` |
31+
| `sync_write` | Whether to call fsync when writing every log | `false` | `raft_engine` |
32+
33+
## Best practices
34+
35+
### Using a separate High-Performance Volume for WAL
36+
It is beneficial to configure a separate volume for the WAL (Write-Ahead Log) directory when deploying GreptimeDB. This setup allows you to:
37+
38+
- Leverage a high-performance disk—either a dedicated physical volume or one provisioned via a custom `StorageClass`.
39+
- Isolate WAL I/O from cache file access, reducing I/O contention and enhancing overall system performance.
40+
41+
If you are using Helm Chart to deploy GreptimeDB, you can refer to [Common Helm Chart Configurations](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) to learn how to configure a dedicated WAL volume.
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
keywords: [WAL, Write-Ahead Logging, Local WAL, Remote WAL, GreptimeDB]
3+
description: This section describes the WAL (Write-Ahead Logging) in GreptimeDB, including the advantages and disadvantages of Local WAL and Remote WAL.
4+
---
5+
# Overview
6+
7+
The [Write-Ahead Logging](/contributor-guide/datanode/wal.md#introduction)(WAL) is a crucial component in GreptimeDB that persistently records every data modification to ensure no memory-cached data loss. GreptimeDB provides two WAL storage options:
8+
9+
- **Local WAL**: Uses an embedded storage engine([raft-engine](https://github.com/tikv/raft-engine)) within the [Datanode](/user-guide/concepts/why-greptimedb.md).
10+
11+
- **Remote WAL**: Uses [Apache Kafka](https://kafka.apache.org/) as the external(remote) WAL storage component.
12+
13+
## Local WAL
14+
15+
### Advantages
16+
17+
- **Low latency**: The local WAL is stored within the same process as the Datanode, eliminating network overhead and providing low write latency.
18+
19+
- **Easy to deploy**: Since the WAL is co-located with the Datanode, no additional components are required, simplifying deployment and operations.
20+
21+
- **Zero RPO**: When deploying GreptimeDB in the cloud, you can configure persistent storage for WAL data using cloud storage services such as AWS EBS or GCP Persistent Disk. This ensures zero [Recovery Point Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective) (RPO), meaning no data loss, even in the event of system failure.
22+
23+
### Disadvantages
24+
25+
- **High RTO**: Because the WAL resides on the same node as the Datanode, the [Recovery Time Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) is relatively high. After a Datanode restarts, it must replay the WAL to restore the latest data, during which time the node remains unavailable.
26+
27+
- **Limited log subscriptions**: The local WAL can only support a single log subscription, as it is tightly coupled with the Datanode process. This limitation makes it difficult to support features like region hot standby and [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md).
28+
29+
## Remote WAL
30+
31+
### Advantages
32+
33+
- **Zero RPO**: WAL data is stored independently of the Datanode, ensuring zero [Recovery Point Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective) (RPO)—meaning no data loss, even in the event of system failure.
34+
35+
- **Low RTO**: By decoupling WAL from the Datanode, [Recovery Time Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) is minimized. If a Datanode crashes, the Metasrv can quickly trigger a [Region Failover](/user-guide/deployments-administration/manage-data/region-failover.md) to migrate affected regions to healthy nodes—without the need to replay WAL locally.
36+
37+
- **Multiple log subscriptions**: Remote WAL supports multiple consumers, enabling features such as region hot standby and [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md). This enhances the high availability and flexibility of GreptimeDB deployments.
38+
39+
### Disadvantages
40+
41+
- **External dependencies**: Remote WAL relies on an external Kafka cluster, which increases the complexity of deployment, operation, and maintenance.
42+
43+
- **Network overhead**: Since WAL data needs to be transmitted over the network, careful planning of cluster network bandwidth is required to ensure low latency and high throughput, especially under write-heavy workloads.
44+
45+
46+
## Next steps
47+
48+
- To configure the Local WAL storage, please refer to [Configuration](/user-guide/deployments-administration/wal/configuration.md).
49+
50+
- To learn more about the Remote WAL, please refer to [Remote WAL](/user-guide/deployments-administration/wal/remote-wal/overview.md).

i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -529,4 +529,19 @@ meta:
529529
enableRegionFailover: true
530530
configData: |
531531
allow_region_failover_on_local_wal = true
532+
```
533+
534+
### 专用 WAL 卷
535+
536+
配置专用 WAL 卷时,可以为 GreptimeDB Datanode 的 WAL(预写日志)目录使用单独的磁盘,并指定自定义的 `StorageClass`。
537+
538+
```yaml
539+
dedicatedWAL:
540+
enabled: true
541+
raftEngine:
542+
fs:
543+
storageClassName: io2 # Use aws ebs io2 storage class for WAL for better performance.
544+
name: wal
545+
storageSize: 20Gi
546+
mountPath: /wal
532547
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
keywords: [配置, 本地 WAL, GreptimeDB Datanode, GreptimeDB]
3+
description: 介绍如何配置 GreptimeDB 中的本地 WAL。
4+
---
5+
6+
# 配置
7+
8+
本节介绍如何配置 GreptimeDB Datanode 组件的本地 WAL。
9+
10+
```toml
11+
[wal]
12+
provider = "raft_engine"
13+
file_size = "128MB"
14+
purge_threshold = "1GB"
15+
purge_interval = "1m"
16+
read_batch_size = 128
17+
sync_write = false
18+
```
19+
20+
## 选项
21+
22+
如果你使用 Helm Chart 部署 GreptimeDB,可以参考[常见 Helm Chart 配置项](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md)了解如何通过注入配置文件以配置 Datanode。
23+
24+
| Configuration Option | Description | Default Value | Provider |
25+
| -------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------------- | ------------- |
26+
| `provider` | The provider of the WAL. Options: `raft_engine` (local file system storage) or `kafka` (remote WAL storage in Kafka) | `"raft_engine"` | All |
27+
| `dir` | The directory where to write logs | `{data_home}/wal` | `raft_engine` |
28+
| `file_size` | The maximum size of the WAL log file | `128MB` | `raft_engine` |
29+
| `purge_threshold` | The threshold of the WAL size to trigger purging | `1GB` | `raft_engine` |
30+
| `purge_interval` | The interval to trigger purging | `1m` | `raft_engine` |
31+
| `read_batch_size` | The read batch size | `128` | `raft_engine` |
32+
| `sync_write` | Whether to call fsync when writing every log | `false` | `raft_engine` |
33+
34+
## 最佳实践
35+
36+
### 使用单独的高性能卷作为 WAL 目录
37+
在部署 GreptimeDB 时,配置单独的卷作为 WAL 目录具有显著优势。这样做可以:
38+
39+
40+
- 使用高性能磁盘——包括专用物理卷或自定义的高性能 `StorageClass`,以提升 WAL 的写入吞吐量。
41+
- 隔离 WAL I/O 与缓存文件访问,降低 I/O 竞争,提升整体系统性能。
42+
43+
44+
如果你使用 Helm Chart 部署 GreptimeDB,可以参考[常见 Helm Chart 配置项](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md)了解如何配置专用 WAL 卷。
45+
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
keywords: [WAL, 预写日志, 本地 WAL, Remote WAL, GreptimeDB]
3+
description: 介绍 GreptimeDB 中的 WAL(预写日志),包括本地 WAL 和远程 WAL 的优缺点。
4+
---
5+
# 概述
6+
7+
[预写日志](/contributor-guide/datanode/wal.md#introduction)(WAL) 是 GreptimeDB 的关键组件之一,负责持久化记录每次数据修改操作,以确保内存中的数据在故障发生时不会丢失。GreptimeDB 支持两种 WAL 存储方案:
8+
9+
10+
- **本地 WAL**: 使用嵌入式存储引擎 [raft-engine](https://github.com/tikv/raft-engine) ,直接集成在 [Datanode](/user-guide/concepts/why-greptimedb.md) 服务中。
11+
12+
- **Remote WAL**: 使用 [Apache Kafka](https://kafka.apache.org/) 作为外部的 WAL 存储组件。
13+
14+
## 本地 WAL
15+
16+
### 优点
17+
18+
- **低延迟**: 本地 WAL 运行于 Datanode 进程内,避免了网络传输开销,提供更低的写入延迟。
19+
20+
- **易于部署**: 由于 WAL 与 Datanode 紧耦合,无需引入额外组件,部署和运维更加简便。
21+
22+
- **零 RPO**: 在云环境中部署 GreptimeDB 时,可以结合云存储服务(如 AWS EBS 或 GCP Persistent Disk)将 WAL 数据持久化存储,从而实现零[恢复点目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective) (RPO),即使发生故障也不会丢失任何已写入的数据。
23+
24+
### 缺点
25+
26+
- **高 RTO**: 由于 WAL 与 Datanode 紧密耦合,[恢复时间目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) 相对较高。在 Datanode 重启后,必须重放 WAL 以恢复最新数据,在此期间节点保持不可用。
27+
28+
- **有限的日志订阅**: 本地 WAL 只能支持单个日志订阅,因为它与 Datanode 进程紧密耦合。这限制了支持区域热备份和 [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md) 等特性的能力。
29+
30+
## Remote WAL
31+
32+
### 优点
33+
34+
- **零 RPO**: WAL 数据独立于 Datanode 存储,确保零[恢复点目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective)(RPO),即使系统发生故障也不会丢失数据。
35+
36+
- **低 RTO**: 通过将 WAL 与 Datanode 解耦,[恢复时间目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) 得以最小化。当 Datanode 崩溃时,Metasrv 会发起 [Region Failover](/user-guide/deployments-administration/manage-data/region-failover.md) ,将受影响 Region 迁移至健康节点,无需本地重放 WAL。
37+
38+
39+
- **多日志订阅**: 多日志订阅:Remote WAL 支持多个消费者,启用区域热备份和 [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md) 等功能,增强 GreptimeDB 部署的高可用性与灵活性。
40+
41+
42+
### 缺点
43+
44+
- **外部依赖**: Remote WAL 依赖外部 Kafka 集群,增加了部署和运维复杂度。
45+
46+
- **网络开销**: WAL 数据需通过网络传输,需合理规划集群网络带宽,确保低延迟与高吞吐量,尤其在写入密集型负载下。
47+
48+
49+
## 后续步骤
50+
51+
- 如需配置本地 WAL 存储,请参阅[配置](/user-guide/deployments-administration/wal/configuration.md)
52+
53+
- 想了解更多 Remote WAL 相关信息,请参阅 [Remote WAL](/user-guide/deployments-administration/wal/remote-wal/overview.md)

sidebars.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,10 +295,13 @@ const sidebars: SidebarsConfig = {
295295
'user-guide/deployments-administration/manage-metadata/manage-etcd',
296296
],
297297
},
298+
298299
{
299300
type: 'category',
300301
label: 'Write-Ahead Logging (WAL)',
301302
items: [
303+
'user-guide/deployments-administration/wal/overview',
304+
'user-guide/deployments-administration/wal/configuration',
302305
{
303306
type: 'category',
304307
label: 'Remote WAL',

0 commit comments

Comments
 (0)