diff --git a/docs/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md b/docs/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md index c59a84433..007284c7a 100644 --- a/docs/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md +++ b/docs/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md @@ -532,6 +532,21 @@ meta: ``` +### Dedicated WAL Volume + +Configuring a dedicated WAL volume allows you to use a separate disk with a custom `StorageClass` for the WAL directory when deploying a GreptimeDB Datanode. + +```yaml +dedicatedWAL: + enabled: true + raftEngine: + fs: + storageClassName: io2 # Use aws ebs io2 storage class for WAL for better performance. + name: wal + storageSize: 20Gi + mountPath: /wal +``` + ### Enable Remote WAL To enable Remote WAL, both Metasrv and Datanode must be properly configured. Before proceeding, make sure to read the [Remote WAL Configuration](/user-guide/deployments-administration/wal/remote-wal/configuration.md) documentation for a complete overview of configuration options and important considerations. @@ -552,4 +567,4 @@ datanode: [wal] provider = "kafka" overwrite_entry_start_id = true -``` \ No newline at end of file +``` diff --git a/docs/user-guide/deployments-administration/wal/local-wal.md b/docs/user-guide/deployments-administration/wal/local-wal.md new file mode 100644 index 000000000..0d050dffa --- /dev/null +++ b/docs/user-guide/deployments-administration/wal/local-wal.md @@ -0,0 +1,41 @@ +--- +keywords: [Configuration, Local WAL, GreptimeDB Datanode, GreptimeDB] +description: This section describes how to configure the Local WAL for GreptimeDB Datanode component. +--- +# Local WAL + +This section describes how to configure the local WAL for GreptimeDB Datanode component. + +```toml +[wal] +provider = "raft_engine" +file_size = "128MB" +purge_threshold = "1GB" +purge_interval = "1m" +read_batch_size = 128 +sync_write = false +``` + +## Options + +If you are using Helm Chart to deploy GreptimeDB, you can refer to [Common Helm Chart Configurations](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) to learn how to configure the Datanode by injecting configuration files. + +| Configuration Option | Description | Default Value | +| -------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------------- | +| `provider` | The provider of the WAL. Options: `raft_engine` (local file system storage) or `kafka` (remote WAL storage in Kafka) | `"raft_engine"` | +| `dir` | The directory where to write logs | `{data_home}/wal` | +| `file_size` | The size of single WAL log file | `128MB` | +| `purge_threshold` | The threshold of the WAL size to trigger purging | `1GB` | +| `purge_interval` | The interval to trigger purging | `1m` | +| `read_batch_size` | The read batch size | `128` | +| `sync_write` | Whether to call fsync when writing every log | `false` | + +## Best practices + +### Using a separate High-Performance Volume for WAL +It is beneficial to configure a separate volume for the WAL (Write-Ahead Log) directory when deploying GreptimeDB. This setup allows you to: + +- Leverage a high-performance disk—either a dedicated physical volume or one provisioned via a custom `StorageClass`. +- Isolate WAL I/O from cache file access, reducing I/O contention and enhancing overall system performance. + +If you are using Helm Chart to deploy GreptimeDB, you can refer to [Common Helm Chart Configurations](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md) to learn how to configure a dedicated WAL volume. diff --git a/docs/user-guide/deployments-administration/wal/overview.md b/docs/user-guide/deployments-administration/wal/overview.md new file mode 100644 index 000000000..ce6cd70b4 --- /dev/null +++ b/docs/user-guide/deployments-administration/wal/overview.md @@ -0,0 +1,48 @@ +--- +keywords: [WAL, Write-Ahead Logging, Local WAL, Remote WAL, GreptimeDB] +description: This section describes the WAL (Write-Ahead Logging) in GreptimeDB, including the advantages and disadvantages of Local WAL and Remote WAL. +--- +# Overview + +The [Write-Ahead Logging](/contributor-guide/datanode/wal.md#introduction)(WAL) is a crucial component in GreptimeDB that persistently records every data modification to ensure no memory-cached data loss. GreptimeDB provides two WAL storage options: + +- **Local WAL**: Uses an embedded storage engine([raft-engine](https://github.com/tikv/raft-engine)) within the [Datanode](/user-guide/concepts/why-greptimedb.md). + +- **Remote WAL**: Uses [Apache Kafka](https://kafka.apache.org/) as the external(remote) WAL storage component. + +## Local WAL + +### Advantages + +- **Low latency**: The local WAL is stored within the same process as the Datanode, eliminating network overhead and providing low write latency. + +- **Easy to deploy**: Since the WAL is co-located with the Datanode, no additional components are required, simplifying deployment and operations. + +- **Zero RPO**: When deploying GreptimeDB in the cloud, you can configure persistent storage for WAL data using cloud storage services such as AWS EBS or GCP Persistent Disk. This ensures zero [Recovery Point Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective) (RPO), meaning no data loss, even in the event of system failure. + +### Disadvantages + +- **High RTO**: Because the WAL resides on the same node as the Datanode, the [Recovery Time Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) is relatively high. After a Datanode restarts, it must replay the WAL to restore the latest data, during which time the node remains unavailable. + +- **Single-Point Access Limitation**: The local WAL is tightly coupled with the Datanode process and only supports a single consumer, which limits features such as region hot standby and [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md). + +## Remote WAL + +### Advantages + +- **Low RTO**: By decoupling WAL from the Datanode, [Recovery Time Objective](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) is minimized. If a Datanode crashes, the Metasrv can quickly trigger a [Region Failover](/user-guide/deployments-administration/manage-data/region-failover.md) to migrate affected regions to healthy nodes—without the need to replay WAL locally. + +- **Multi-Consumer Subscriptions**: Remote WAL supports multiple consumers subscribing to WAL logs simultaneously, enabling features such as region hot standby and [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md), thereby enhancing system availability and flexibility. + +### Disadvantages + +- **External dependencies**: Remote WAL relies on an external Kafka cluster, which increases the complexity of deployment, operation, and maintenance. + +- **Network overhead**: Since WAL data needs to be transmitted over the network, careful planning of cluster network bandwidth is required to ensure low latency and high throughput, especially under write-heavy workloads. + + +## Next steps + +- To configure the Local WAL storage, please refer to [Local WAL](/user-guide/deployments-administration/wal/local-wal.md). + +- To learn more about the Remote WAL, please refer to [Remote WAL](/user-guide/deployments-administration/wal/remote-wal/quick-start.md). \ No newline at end of file diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md index 82fff4ed4..59e2507ed 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md @@ -531,9 +531,24 @@ meta: allow_region_failover_on_local_wal = true ``` -### 启用 Remote WAL +### 专用 WAL 卷 + +配置专用 WAL 卷时,可以为 GreptimeDB Datanode 的 WAL(预写日志)目录使用单独的磁盘,并指定自定义的 `StorageClass`。 + +```yaml +dedicatedWAL: + enabled: true + raftEngine: + fs: + storageClassName: io2 # 使用 AWS ebs io2 存储以获得更好的性能 + name: wal + storageSize: 20Gi + mountPath: /wal +``` +### 启用 Remote WAL + 在启用前,请务必查阅 [Remote WAL 配置](/user-guide/deployments-administration/wal/remote-wal/configuration.md)文档,以了解完整的配置项说明及相关注意事项。 ```yaml diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/local-wal.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/local-wal.md new file mode 100644 index 000000000..7ab12b00f --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/local-wal.md @@ -0,0 +1,45 @@ +--- +keywords: [配置, 本地 WAL, GreptimeDB Datanode, GreptimeDB] +description: 介绍如何配置 GreptimeDB 中的本地 WAL。 +--- + +# 本地 WAL + +本节介绍如何配置 GreptimeDB Datanode 组件的本地 WAL。 + +```toml +[wal] +provider = "raft_engine" +file_size = "128MB" +purge_threshold = "1GB" +purge_interval = "1m" +read_batch_size = 128 +sync_write = false +``` + +## 选项 + +如果你使用 Helm Chart 部署 GreptimeDB,可以参考[常见 Helm Chart 配置项](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md)了解如何通过注入配置文件以配置 Datanode。 + +| 配置项 | 描述 | 默认值 | +| ----------------- | ----------------------------------------------------------------------------------------------- | ----------------- | +| `provider` | WAL 的提供者。可选项:`raft_engine`(本地文件系统存储)或 `kafka`(使用 Kafka 的远程 WAL 存储) | `"raft_engine"` | +| `dir` | 日志写入目录 | `{data_home}/wal` | +| `file_size` | 单个 WAL 日志文件的大小 | `128MB` | +| `purge_threshold` | 触发清理的 WAL 总大小阈值 | `1GB` | +| `purge_interval` | 触发清理的时间间隔 | `1m` | +| `read_batch_size` | 读取批次大小 | `128` | +| `sync_write` | 是否在每次写入日志时调用 fsync | `false` | + +## 最佳实践 + +### 使用单独的高性能卷作为 WAL 目录 +在部署 GreptimeDB 时,配置单独的卷作为 WAL 目录具有显著优势。这样做可以: + + +- 使用高性能磁盘——包括专用物理卷或自定义的高性能 `StorageClass`,以提升 WAL 的写入吞吐量。 +- 隔离 WAL I/O 与缓存文件访问,降低 I/O 竞争,提升整体系统性能。 + + +如果你使用 Helm Chart 部署 GreptimeDB,可以参考[常见 Helm Chart 配置项](/user-guide/deployments-administration/deploy-on-kubernetes/common-helm-chart-configurations.md)了解如何配置专用 WAL 卷。 + diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/overview.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/overview.md new file mode 100644 index 000000000..2c92c1dbc --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/overview.md @@ -0,0 +1,51 @@ +--- +keywords: [WAL, 预写日志, 本地 WAL, Remote WAL, GreptimeDB] +description: 介绍 GreptimeDB 中的 WAL(预写日志),包括本地 WAL 和远程 WAL 的优缺点。 +--- +# 概述 + +[预写日志](/contributor-guide/datanode/wal.md#introduction)(WAL) 是 GreptimeDB 的关键组件之一,负责持久化记录每次数据修改操作,以确保内存中的数据在故障发生时不会丢失。GreptimeDB 支持两种 WAL 存储方案: + + +- **本地 WAL**: 使用嵌入式存储引擎 [raft-engine](https://github.com/tikv/raft-engine) ,直接集成在 [Datanode](/user-guide/concepts/why-greptimedb.md) 服务中。 + +- **Remote WAL**: 使用 [Apache Kafka](https://kafka.apache.org/) 作为外部的 WAL 存储组件。 + +## 本地 WAL + +### 优点 + +- **低延迟**: 本地 WAL 运行于 Datanode 进程内,避免了网络传输开销,提供更低的写入延迟。 + +- **易于部署**: 由于 WAL 与 Datanode 紧耦合,无需引入额外组件,部署和运维更加简便。 + +- **零 RPO**: 在云环境中部署 GreptimeDB 时,可以结合云存储服务(如 AWS EBS 或 GCP Persistent Disk)将 WAL 数据持久化存储,从而实现零[恢复点目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Point_Objective) (RPO),即使发生故障也不会丢失任何已写入的数据。 + +### 缺点 + +- **高 RTO**: 由于 WAL 与 Datanode 紧密耦合,[恢复时间目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) 相对较高。在 Datanode 重启后,必须重放 WAL 以恢复最新数据,在此期间节点保持不可用。 + +- **单点访问限制**: 本地 WAL 与 Datanode 进程紧密耦合,仅支持单个消费者访问,限制了区域热备份和 [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md) 等功能的实现。 + +## Remote WAL + +### 优点 + +- **低 RTO**: 通过将 WAL 与 Datanode 解耦,[恢复时间目标](https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Time_Objective) (RTO) 得以最小化。当 Datanode 崩溃时,Metasrv 会发起 [Region Failover](/user-guide/deployments-administration/manage-data/region-failover.md) ,将受影响 Region 迁移至健康节点,无需本地重放 WAL。 + + +- **多消费者订阅**:Remote WAL 支持多个消费者同时订阅 WAL 日志,实现 Region 热备和 [Region Migration](/user-guide/deployments-administration/manage-data/region-migration.md) 等功能,提升系统的高可用性和灵活性。 + + +### 缺点 + +- **外部依赖**: Remote WAL 依赖外部 Kafka 集群,增加了部署和运维复杂度。 + +- **网络开销**: WAL 数据需通过网络传输,需合理规划集群网络带宽,确保低延迟与高吞吐量,尤其在写入密集型负载下。 + + +## 后续步骤 + +- 如需配置本地 WAL 存储,请参阅[本地 WAL](/user-guide/deployments-administration/wal/local-wal.md)。 + +- 想了解更多 Remote WAL 相关信息,请参阅 [Remote WAL](/user-guide/deployments-administration/wal/remote-wal/quick-start.md)。 \ No newline at end of file diff --git a/sidebars.ts b/sidebars.ts index dee0e0566..e9065917b 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -295,10 +295,13 @@ const sidebars: SidebarsConfig = { 'user-guide/deployments-administration/manage-metadata/manage-etcd', ], }, + { type: 'category', label: 'Write-Ahead Logging (WAL)', items: [ + 'user-guide/deployments-administration/wal/overview', + 'user-guide/deployments-administration/wal/local-wal', { type: 'category', label: 'Remote WAL',