Skip to content

docs: add WAL documentation and dedicated WAL volume configuration #1880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -532,6 +532,21 @@ meta:

```

### Dedicated WAL Volume

Configuring a dedicated WAL volume allows you to use a separate disk with a custom `StorageClass` for the WAL directory when deploying a GreptimeDB Datanode.

```yaml
dedicatedWAL:
enabled: true
raftEngine:
fs:
storageClassName: io2 # Use aws ebs io2 storage class for WAL for better performance.
name: wal
storageSize: 20Gi
mountPath: /wal
```

### Enable Remote WAL

To enable Remote WAL, both Metasrv and Datanode must be properly configured. Before proceeding, make sure to read the [Remote WAL Configuration](/user-guide/deployments-administration/wal/remote-wal/configuration.md) documentation for a complete overview of configuration options and important considerations.
Expand All @@ -552,4 +567,4 @@ datanode:
[wal]
provider = "kafka"
overwrite_entry_start_id = true
```
```
41 changes: 41 additions & 0 deletions docs/user-guide/deployments-administration/wal/local-wal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
keywords: [Configuration, Local WAL, GreptimeDB Datanode, GreptimeDB]
description: This section describes how to configure the Local WAL for GreptimeDB Datanode component.
---
# Local WAL

This section describes how to configure the local WAL for GreptimeDB Datanode component.

```toml
[wal]
provider = "raft_engine"
file_size = "128MB"
purge_threshold = "1GB"
purge_interval = "1m"
read_batch_size = 128
sync_write = false
```

## Options

If you are using Helm Chart to deploy GreptimeDB, you can refer to [Common Helm Chart Configurations](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) to learn how to configure the Datanode by injecting configuration files.

| Configuration Option | Description | Default Value |
| -------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------------- |
| `provider` | The provider of the WAL. Options: `raft_engine` (local file system storage) or `kafka` (remote WAL storage in Kafka) | `"raft_engine"` |
| `dir` | The directory where to write logs | `{data_home}/wal` |
| `file_size` | The size of single WAL log file | `128MB` |
| `purge_threshold` | The threshold of the WAL size to trigger purging | `1GB` |
| `purge_interval` | The interval to trigger purging | `1m` |
| `read_batch_size` | The read batch size | `128` |
| `sync_write` | Whether to call fsync when writing every log | `false` |

## Best practices

### Using a separate High-Performance Volume for WAL
It is beneficial to configure a separate volume for the WAL (Write-Ahead Log) directory when deploying GreptimeDB. This setup allows you to:

- Leverage a high-performance disk—either a dedicated physical volume or one provisioned via a custom `StorageClass`.
- Isolate WAL I/O from cache file access, reducing I/O contention and enhancing overall system performance.

If you are using Helm Chart to deploy GreptimeDB, you can refer to [Common Helm Chart Configurations](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) to learn how to configure a dedicated WAL volume.
48 changes: 48 additions & 0 deletions docs/user-guide/deployments-administration/wal/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
keywords: [WAL, Write-Ahead Logging, Local WAL, Remote WAL, GreptimeDB]
description: This section describes the WAL (Write-Ahead Logging) in GreptimeDB, including the advantages and disadvantages of Local WAL and Remote WAL.
---
# Overview

The [Write-Ahead Logging](/contributor-guide/datanode/wal.md#introduction)(WAL) is a crucial component in GreptimeDB that persistently records every data modification to ensure no memory-cached data loss. GreptimeDB provides two WAL storage options:

- **Local WAL**: Uses an embedded storage engine([raft-engine](https://github.com/tikv/raft-engine)) within the [Datanode](/user-guide/concepts/why-greptimedb.md).

- **Remote WAL**: Uses [Apache Kafka](https://kafka.apache.org/) as the external(remote) WAL storage component.

## Local WAL

### Advantages

- **Low latency**: The local WAL is stored within the same process as the Datanode, eliminating network overhead and providing low write latency.

- **Easy to deploy**: Since the WAL is co-located with the Datanode, no additional components are required, simplifying deployment and operations.

- **Zero RPO**: When deploying GreptimeDB in the cloud, you can configure persistent storage for WAL data using cloud storage services such as AWS EBS or GCP Persistent Disk. This ensures zero [Recovery Point Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective) (RPO), meaning no data loss, even in the event of system failure.

### Disadvantages

- **High RTO**: Because the WAL resides on the same node as the Datanode, the [Recovery Time Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) is relatively high. After a Datanode restarts, it must replay the WAL to restore the latest data, during which time the node remains unavailable.

- **Single-Point Access Limitation**: The local WAL is tightly coupled with the Datanode process and only supports a single consumer, which limits features such as region hot standby and [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md).

## Remote WAL

### Advantages

- **Low RTO**: By decoupling WAL from the Datanode, [Recovery Time Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) is minimized. If a Datanode crashes, the Metasrv can quickly trigger a [Region Failover](/user-guide/deployments-administration/manage-data/region-failover.md) to migrate affected regions to healthy nodes—without the need to replay WAL locally.

- **Multi-Consumer Subscriptions**: Remote WAL supports multiple consumers subscribing to WAL logs simultaneously, enabling features such as region hot standby and [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md), thereby enhancing system availability and flexibility.

### Disadvantages

- **External dependencies**: Remote WAL relies on an external Kafka cluster, which increases the complexity of deployment, operation, and maintenance.

- **Network overhead**: Since WAL data needs to be transmitted over the network, careful planning of cluster network bandwidth is required to ensure low latency and high throughput, especially under write-heavy workloads.


## Next steps

- To configure the Local WAL storage, please refer to [Local WAL](/user-guide/deployments-administration/wal/local-wal.md).

- To learn more about the Remote WAL, please refer to [Remote WAL](/user-guide/deployments-administration/wal/remote-wal/quick-start.md).
Original file line number Diff line number Diff line change
Expand Up @@ -531,9 +531,24 @@ meta:
allow_region_failover_on_local_wal = true
```

### 启用 Remote WAL
### 专用 WAL 卷

配置专用 WAL 卷时,可以为 GreptimeDB Datanode 的 WAL(预写日志)目录使用单独的磁盘,并指定自定义的 `StorageClass`。

```yaml
dedicatedWAL:
enabled: true
raftEngine:
fs:
storageClassName: io2 # 使用 AWS ebs io2 存储以获得更好的性能
name: wal
storageSize: 20Gi
mountPath: /wal
```


### 启用 Remote WAL

在启用前,请务必查阅 [Remote WAL 配置](/user-guide/deployments-administration/wal/remote-wal/configuration.md)文档,以了解完整的配置项说明及相关注意事项。

```yaml
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
keywords: [配置, 本地 WAL, GreptimeDB Datanode, GreptimeDB]
description: 介绍如何配置 GreptimeDB 中的本地 WAL。
---

# 本地 WAL

本节介绍如何配置 GreptimeDB Datanode 组件的本地 WAL。

```toml
[wal]
provider = "raft_engine"
file_size = "128MB"
purge_threshold = "1GB"
purge_interval = "1m"
read_batch_size = 128
sync_write = false
```

## 选项

如果你使用 Helm Chart 部署 GreptimeDB,可以参考[常见 Helm Chart 配置项](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md)了解如何通过注入配置文件以配置 Datanode。

| 配置项 | 描述 | 默认值 |
| ----------------- | ----------------------------------------------------------------------------------------------- | ----------------- |
| `provider` | WAL 的提供者。可选项:`raft_engine`(本地文件系统存储)或 `kafka`(使用 Kafka 的远程 WAL 存储) | `"raft_engine"` |
| `dir` | 日志写入目录 | `{data_home}/wal` |
| `file_size` | 单个 WAL 日志文件的大小 | `128MB` |
| `purge_threshold` | 触发清理的 WAL 总大小阈值 | `1GB` |
| `purge_interval` | 触发清理的时间间隔 | `1m` |
| `read_batch_size` | 读取批次大小 | `128` |
| `sync_write` | 是否在每次写入日志时调用 fsync | `false` |

## 最佳实践

### 使用单独的高性能卷作为 WAL 目录
在部署 GreptimeDB 时,配置单独的卷作为 WAL 目录具有显著优势。这样做可以:


- 使用高性能磁盘——包括专用物理卷或自定义的高性能 `StorageClass`,以提升 WAL 的写入吞吐量。
- 隔离 WAL I/O 与缓存文件访问,降低 I/O 竞争,提升整体系统性能。


如果你使用 Helm Chart 部署 GreptimeDB,可以参考[常见 Helm Chart 配置项](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md)了解如何配置专用 WAL 卷。

Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
keywords: [WAL, 预写日志, 本地 WAL, Remote WAL, GreptimeDB]
description: 介绍 GreptimeDB 中的 WAL(预写日志),包括本地 WAL 和远程 WAL 的优缺点。
---
# 概述

[预写日志](/contributor-guide/datanode/wal.md#introduction)(WAL) 是 GreptimeDB 的关键组件之一,负责持久化记录每次数据修改操作,以确保内存中的数据在故障发生时不会丢失。GreptimeDB 支持两种 WAL 存储方案:


- **本地 WAL**: 使用嵌入式存储引擎 [raft-engine](https://github.com/tikv/raft-engine) ,直接集成在 [Datanode](/user-guide/concepts/why-greptimedb.md) 服务中。

- **Remote WAL**: 使用 [Apache Kafka](https://kafka.apache.org/) 作为外部的 WAL 存储组件。

## 本地 WAL

### 优点

- **低延迟**: 本地 WAL 运行于 Datanode 进程内,避免了网络传输开销,提供更低的写入延迟。

- **易于部署**: 由于 WAL 与 Datanode 紧耦合,无需引入额外组件,部署和运维更加简便。

- **零 RPO**: 在云环境中部署 GreptimeDB 时,可以结合云存储服务(如 AWS EBS 或 GCP Persistent Disk)将 WAL 数据持久化存储,从而实现零[恢复点目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective) (RPO),即使发生故障也不会丢失任何已写入的数据。

### 缺点

- **高 RTO**: 由于 WAL 与 Datanode 紧密耦合,[恢复时间目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) 相对较高。在 Datanode 重启后,必须重放 WAL 以恢复最新数据,在此期间节点保持不可用。

- **单点访问限制**: 本地 WAL 与 Datanode 进程紧密耦合,仅支持单个消费者访问,限制了区域热备份和 [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md) 等功能的实现。

## Remote WAL

### 优点

- **低 RTO**: 通过将 WAL 与 Datanode 解耦,[恢复时间目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) 得以最小化。当 Datanode 崩溃时,Metasrv 会发起 [Region Failover](/user-guide/deployments-administration/manage-data/region-failover.md) ,将受影响 Region 迁移至健康节点,无需本地重放 WAL。


- **多消费者订阅**:Remote WAL 支持多个消费者同时订阅 WAL 日志,实现 Region 热备和 [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md) 等功能,提升系统的高可用性和灵活性。


### 缺点

- **外部依赖**: Remote WAL 依赖外部 Kafka 集群,增加了部署和运维复杂度。

- **网络开销**: WAL 数据需通过网络传输,需合理规划集群网络带宽,确保低延迟与高吞吐量,尤其在写入密集型负载下。


## 后续步骤

- 如需配置本地 WAL 存储,请参阅[本地 WAL](/user-guide/deployments-administration/wal/local-wal.md)。

- 想了解更多 Remote WAL 相关信息,请参阅 [Remote WAL](/user-guide/deployments-administration/wal/remote-wal/quick-start.md)。
3 changes: 3 additions & 0 deletions sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -295,10 +295,13 @@ const sidebars: SidebarsConfig = {
'user-guide/deployments-administration/manage-metadata/manage-etcd',
],
},

{
type: 'category',
label: 'Write-Ahead Logging (WAL)',
items: [
'user-guide/deployments-administration/wal/overview',
'user-guide/deployments-administration/wal/local-wal',
{
type: 'category',
label: 'Remote WAL',
Expand Down