docs: add WAL documentation and dedicated WAL volume configuration

WenyXu · WenyXu · commit c0b6b77f73b4 · 2025-06-26T19:53:36.000+08:00
Signed-off-by: WenyXu &lt;wenymedia@gmail.com&gt;
diff --git a/docs/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md b/docs/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md
@@ -529,4 +529,19 @@ meta:
   enableRegionFailover: true
   configData: |
     allow_region_failover_on_local_wal = true
+```
+
+### Dedicated WAL Volume
+
+Configuring a dedicated WAL volume allows you to use a separate disk with a custom `StorageClass` for the WAL directory when deploying a GreptimeDB Datanode.
+
+```yaml
+dedicatedWAL:
+  enabled: true
+  raftEngine:
+    fs:
+      storageClassName: io2 # Use aws ebs io2 storage class for WAL for better performance.
+      name: wal
+      storageSize: 20Gi
+      mountPath: /wal
 ```
diff --git a/docs/user-guide/deployments-administration/wal/configuration.md b/docs/user-guide/deployments-administration/wal/configuration.md
@@ -0,0 +1,41 @@
+---
+keywords: [Configuration, Local WAL, GreptimeDB Datanode, GreptimeDB]
+description: This section describes how to configure the Local WAL for GreptimeDB Datanode component.
+---
+# Configuration
+
+This section describes how to configure the Local WAL for GreptimeDB Datanode component. 
+
+```toml
+[wal]
+provider = "raft_engine"
+file_size = "128MB"
+purge_threshold = "1GB"
+purge_interval = "1m"
+read_batch_size = 128
+sync_write = false
+```
+
+## Options
+
+If you are using Helm Chart to deploy GreptimeDB, you can refer to [Common Helm Chart Configurations](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) to learn how to configure the Datanode by injecting configuration files.
+
+| Configuration Option | Description                                                                                                          | Default Value     | Provider      |
+| -------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------------- | ------------- |
+| `provider`           | The provider of the WAL. Options: `raft_engine` (local file system storage) or `kafka` (remote WAL storage in Kafka) | `"raft_engine"`   | All           |
+| `dir`                | The directory where to write logs                                                                                    | `{data_home}/wal` | `raft_engine` |
+| `file_size`          | The maximum size of the WAL log file                                                                                 | `128MB`           | `raft_engine` |
+| `purge_threshold`    | The threshold of the WAL size to trigger purging                                                                     | `1GB`             | `raft_engine` |
+| `purge_interval`     | The interval to trigger purging                                                                                      | `1m`              | `raft_engine` |
+| `read_batch_size`    | The read batch size                                                                                                  | `128`             | `raft_engine` |
+| `sync_write`         | Whether to call fsync when writing every log                                                                         | `false`           | `raft_engine` |
+
+## Best practices
+
+### Using a separate High-Performance Volume for WAL
+It is beneficial to configure a separate volume for the WAL (Write-Ahead Log) directory when deploying GreptimeDB. This setup allows you to:
+
+- Leverage a high-performance disk—either a dedicated physical volume or one provisioned via a custom `StorageClass`.
+- Isolate WAL I/O from cache file access, reducing I/O contention and enhancing overall system performance.
+
+If you are using Helm Chart to deploy GreptimeDB, you can refer to [Common Helm Chart Configurations](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) to learn how to configure a dedicated WAL volume.
diff --git a/docs/user-guide/deployments-administration/wal/overview.md b/docs/user-guide/deployments-administration/wal/overview.md
@@ -0,0 +1,50 @@
+---
+keywords: [WAL, Write-Ahead Logging, Local WAL, Remote WAL, GreptimeDB]
+description: This section describes the WAL (Write-Ahead Logging) in GreptimeDB, including the advantages and disadvantages of Local WAL and Remote WAL.
+---
+# Overview
+
+The [Write-Ahead Logging](/contributor-guide/datanode/wal.md#introduction)(WAL) is a crucial component in GreptimeDB that persistently records every data modification to ensure no memory-cached data loss. GreptimeDB provides two WAL storage options:
+
+- **Local WAL**: Uses an embedded storage engine([raft-engine](https://github.com/tikv/raft-engine)) within the [Datanode](/user-guide/concepts/why-greptimedb.md).
+
+- **Remote WAL**: Uses [Apache Kafka](https://kafka.apache.org/) as the external(remote) WAL storage component. 
+
+## Local WAL
+
+### Advantages
+
+- **Low latency**: The local WAL is stored within the same process as the Datanode, eliminating network overhead and providing low write latency.
+
+- **Easy to deploy**: Since the WAL is co-located with the Datanode, no additional components are required, simplifying deployment and operations.
+
+- **Zero RPO**: When deploying GreptimeDB in the cloud, you can configure persistent storage for WAL data using cloud storage services such as AWS EBS or GCP Persistent Disk. This ensures zero [Recovery Point Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective) (RPO), meaning no data loss, even in the event of system failure.
+
+### Disadvantages
+
+- **High RTO**: Because the WAL resides on the same node as the Datanode, the [Recovery Time Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) is relatively high. After a Datanode restarts, it must replay the WAL to restore the latest data, during which time the node remains unavailable.
+
+- **Limited log subscriptions**: The local WAL can only support a single log subscription, as it is tightly coupled with the Datanode process. This limitation makes it difficult to support features like region hot standby and [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md).
+
+## Remote WAL
+
+### Advantages
+
+- **Zero RPO**: WAL data is stored independently of the Datanode, ensuring zero [Recovery Point Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective) (RPO)—meaning no data loss, even in the event of system failure.
+
+- **Low RTO**: By decoupling WAL from the Datanode, [Recovery Time Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) is minimized. If a Datanode crashes, the Metasrv can quickly trigger a [Region Failover](/user-guide/deployments-administration/manage-data/region-failover.md) to migrate affected regions to healthy nodes—without the need to replay WAL locally.
+
+- **Multiple log subscriptions**: Remote WAL supports multiple consumers, enabling features such as region hot standby and [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md). This enhances the high availability and flexibility of GreptimeDB deployments.
+
+### Disadvantages
+
+- **External dependencies**: Remote WAL relies on an external Kafka cluster, which increases the complexity of deployment, operation, and maintenance.
+
+- **Network overhead**: Since WAL data needs to be transmitted over the network, careful planning of cluster network bandwidth is required to ensure low latency and high throughput, especially under write-heavy workloads.
+
+
+## Next steps
+
+- To configure the Local WAL storage, please refer to [Configuration](/user-guide/deployments-administration/wal/configuration.md).
+
+- To learn more about the Remote WAL, please refer to [Remote WAL](/user-guide/deployments-administration/wal/remote-wal/overview.md).
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md
@@ -529,4 +529,19 @@ meta:
   enableRegionFailover: true
   configData: |
     allow_region_failover_on_local_wal = true
+```
+
+### 专用 WAL 卷
+
+配置专用 WAL 卷时，可以为 GreptimeDB Datanode 的 WAL（预写日志）目录使用单独的磁盘，并指定自定义的 `StorageClass`。
+
+```yaml
+dedicatedWAL:
+  enabled: true
+  raftEngine:
+    fs:
+      storageClassName: io2 # Use aws ebs io2 storage class for WAL for better performance.
+      name: wal
+      storageSize: 20Gi
+      mountPath: /wal
 ```
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/configuration.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/configuration.md
@@ -0,0 +1,45 @@
+---
+keywords: [配置, 本地 WAL, GreptimeDB Datanode, GreptimeDB]
+description: 介绍如何配置 GreptimeDB 中的本地 WAL。
+---
+
+# 配置
+
+本节介绍如何配置 GreptimeDB Datanode 组件的本地 WAL。
+
+```toml
+[wal]
+provider = "raft_engine"
+file_size = "128MB"
+purge_threshold = "1GB"
+purge_interval = "1m"
+read_batch_size = 128
+sync_write = false
+```
+
+## 选项
+
+如果你使用 Helm Chart 部署 GreptimeDB，可以参考[常见 Helm Chart 配置项](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md)了解如何通过注入配置文件以配置 Datanode。
+
+| Configuration Option | Description                                                                                                          | Default Value     | Provider      |
+| -------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------------- | ------------- |
+| `provider`           | The provider of the WAL. Options: `raft_engine` (local file system storage) or `kafka` (remote WAL storage in Kafka) | `"raft_engine"`   | All           |
+| `dir`                | The directory where to write logs                                                                                    | `{data_home}/wal` | `raft_engine` |
+| `file_size`          | The maximum size of the WAL log file                                                                                 | `128MB`           | `raft_engine` |
+| `purge_threshold`    | The threshold of the WAL size to trigger purging                                                                     | `1GB`             | `raft_engine` |
+| `purge_interval`     | The interval to trigger purging                                                                                      | `1m`              | `raft_engine` |
+| `read_batch_size`    | The read batch size                                                                                                  | `128`             | `raft_engine` |
+| `sync_write`         | Whether to call fsync when writing every log                                                                         | `false`           | `raft_engine` |
+
+## 最佳实践
+
+### 使用单独的高性能卷作为 WAL 目录
+在部署 GreptimeDB 时，配置单独的卷作为 WAL 目录具有显著优势。这样做可以：
+
+
+- 使用高性能磁盘——包括专用物理卷或自定义的高性能 `StorageClass`，以提升 WAL 的写入吞吐量。
+- 隔离 WAL I/O 与缓存文件访问，降低 I/O 竞争，提升整体系统性能。
+
+
+如果你使用 Helm Chart 部署 GreptimeDB，可以参考[常见 Helm Chart 配置项](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md)了解如何配置专用 WAL 卷。
+
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/overview.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/overview.md
@@ -0,0 +1,53 @@
+---
+keywords: [WAL, 预写日志, 本地 WAL, Remote WAL, GreptimeDB]
+description: 介绍 GreptimeDB 中的 WAL（预写日志），包括本地 WAL 和远程 WAL 的优缺点。
+---
+# 概述
+
+[预写日志](/contributor-guide/datanode/wal.md#introduction)(WAL) 是 GreptimeDB 的关键组件之一，负责持久化记录每次数据修改操作，以确保内存中的数据在故障发生时不会丢失。GreptimeDB 支持两种 WAL 存储方案：
+
+
+- **本地 WAL**: 使用嵌入式存储引擎 [raft-engine](https://github.com/tikv/raft-engine) ，直接集成在 [Datanode](/user-guide/concepts/why-greptimedb.md) 服务中。
+
+- **Remote WAL**: 使用 [Apache Kafka](https://kafka.apache.org/) 作为外部的 WAL 存储组件。
+
+## 本地 WAL
+
+### 优点
+
+- **低延迟**: 本地 WAL 运行于 Datanode 进程内，避免了网络传输开销，提供更低的写入延迟。
+
+- **易于部署**: 由于 WAL 与 Datanode 紧耦合，无需引入额外组件，部署和运维更加简便。
+
+- **零 RPO**: 在云环境中部署 GreptimeDB 时，可以结合云存储服务（如 AWS EBS 或 GCP Persistent Disk）将 WAL 数据持久化存储，从而实现零[恢复点目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective) (RPO)，即使发生故障也不会丢失任何已写入的数据。
+
+### 缺点
+
+- **高 RTO**: 由于 WAL 与 Datanode 紧密耦合，[恢复时间目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) 相对较高。在 Datanode 重启后，必须重放 WAL 以恢复最新数据，在此期间节点保持不可用。
+
+- **有限的日志订阅**: 本地 WAL 只能支持单个日志订阅，因为它与 Datanode 进程紧密耦合。这限制了支持区域热备份和 [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md) 等特性的能力。
+
+## Remote WAL
+
+### 优点
+
+- **零 RPO**: WAL 数据独立于 Datanode 存储，确保零[恢复点目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective)（RPO），即使系统发生故障也不会丢失数据。
+
+- **低 RTO**: 通过将 WAL 与 Datanode 解耦，[恢复时间目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) 得以最小化。当 Datanode 崩溃时，Metasrv 会发起 [Region Failover](/user-guide/deployments-administration/manage-data/region-failover.md) ，将受影响 Region 迁移至健康节点，无需本地重放 WAL。
+
+
+- **多日志订阅**: 多日志订阅：Remote WAL 支持多个消费者，启用区域热备份和 [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md) 等功能，增强 GreptimeDB 部署的高可用性与灵活性。
+
+
+### 缺点
+
+- **外部依赖**: Remote WAL 依赖外部 Kafka 集群，增加了部署和运维复杂度。
+
+- **网络开销**: WAL 数据需通过网络传输，需合理规划集群网络带宽，确保低延迟与高吞吐量，尤其在写入密集型负载下。
+
+
+## 后续步骤
+
+- 如需配置本地 WAL 存储，请参阅[配置](/user-guide/deployments-administration/wal/configuration.md)。
+
+- 想了解更多 Remote WAL 相关信息，请参阅　[Remote WAL](/user-guide/deployments-administration/wal/remote-wal/overview.md)。
diff --git a/sidebars.ts b/sidebars.ts
@@ -295,10 +295,13 @@ const sidebars: SidebarsConfig = {
                 'user-guide/deployments-administration/manage-metadata/manage-etcd',
               ],
             },
+
             {
               type: 'category',
               label: 'Write-Ahead Logging (WAL)',
               items: [
+                'user-guide/deployments-administration/wal/overview',
+                'user-guide/deployments-administration/wal/configuration',
                 {
                   type: 'category',
                   label: 'Remote WAL',

Original file line number	Diff line number	Diff line change
`@@ -295,10 +295,13 @@ const sidebars: SidebarsConfig = {`
`295`	`295`	`'user-guide/deployments-administration/manage-metadata/manage-etcd',`
`296`	`296`	`],`
`297`	`297`	`},`
	`298`	`+`
`298`	`299`	`{`
`299`	`300`	`type: 'category',`
`300`	`301`	`label: 'Write-Ahead Logging (WAL)',`
`301`	`302`	`items: [`
	`303`	`+ 'user-guide/deployments-administration/wal/overview',`
	`304`	`+ 'user-guide/deployments-administration/wal/configuration',`
`302`	`305`	`{`
`303`	`306`	`type: 'category',`
`304`	`307`	`label: 'Remote WAL',`