docs: table suffix (#1614)

shuiyisong · nicecui · fengys1996 · web-flow · commit e7e2bd259b1c · 2025-04-01T15:41:20.000+08:00
Co-authored-by: Yiran &lt;cuiyiran3@gmail.com&gt;
Co-authored-by: fys &lt;40801205+fengys1996@users.noreply.github.com&gt;
diff --git a/docs/user-guide/logs/pipeline-config.md b/docs/user-guide/logs/pipeline-config.md
@@ -11,10 +11,16 @@ These configurations are provided in YAML format, allowing the Pipeline to proce
 
 ## Overall structure
 
-Pipeline consists of two parts: Processors and Transform, both of which are in array format. A Pipeline configuration can contain multiple Processors and multiple Transforms. The data type described by Transform determines the table structure when storing log data in the database.
+Pipeline consists of four parts: Processors, Dispatcher, Transform, and Table suffix.
+Processors pre-processes input log data.
+Dispatcher forwards pipeline execution context onto different subsequent pipeline.
+Transform decides the final datatype and table structure in the database.
+Table suffix allows storing the data into different tables.
 
 - Processors are used for preprocessing log data, such as parsing time fields and replacing fields.
+- Dispatcher(optional) is used for forwarding the context into another pipeline, so that the same batch of input data can be divided and processed by different pipeline based on certain fields.
 - Transform is used for converting data formats, such as converting string types to numeric types.
+- Table suffix(optional) is used for storing data into different table for later convenience.
 
 Here is an example of a simple configuration that includes Processors and Transform:
 
@@ -45,6 +51,7 @@ transform:
     # epoch is a special field type and must specify precision
     type: epoch, ms
     index: timestamp
+table_suffix: _${string_field_a}
 ```
 
 ## Processor
@@ -770,3 +777,46 @@ matches the `http` rule, data is stored in `applogs_http`.
 
 If no rules match, data is transformed by the current pipeline's
 transformations.
+
+## Table suffix
+
+:::warning Experimental Feature
+This experimental feature may contain unexpected behavior, have its functionality change in the future.
+:::
+
+There are cases where you want to split and insert log data into different target table
+based on some certain values of input data. For example, if you want to divide and store the log data 
+based on the application where the log is produced, thus adding an app name suffix to the target table.
+
+A sample configuration is like:
+```yaml
+table_suffix: _${app_name}
+```
+
+The syntax is simple: use `${}` to include the variable in the pipeline execution context.
+The variable can be directly from the input data or a product of former process.
+After the table suffix is formatted, the whole string will be added to the input table name.
+
+Note:
+1. The variable must be an integer number or a string type of data.
+2. If any error occurs in runtime(e.g: the variable is missing or not a valid type), the input table
+name will be used.
+
+Here is an example of how it works. The input data is like following:
+```JSON
+[
+  {"type": "db"},
+  {"type": "http"},
+  {"t": "test"}
+]
+```
+
+The input table name is `persist_app`, and the pipeline config is like
+```YAML
+table_suffix: _${type}
+```
+
+These three lines of input log will be inserted into three tables:
+1. `persist_app_db`
+2. `persist_app_http`
+3. `persist_app`, for it doesn't have a `type` field, thus the default table name will be used.
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/logs/pipeline-config.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/logs/pipeline-config.md
@@ -11,10 +11,16 @@ Pipeline 是 GreptimeDB 中对 log 数据进行解析和转换的一种机制，
 
 ## 整体结构
 
-Pipeline 由两部分组成：Processors 和 Transform，这两部分均为数组形式。一个 Pipeline 配置可以包含多个 Processor 和多个 Transform。Transform 所描述的数据类型会决定日志数据保存到数据库时的表结构。
+Pipeline 由四部分组成：Processors、Dispatcher、Transform 和 Table suffix。
+Processors 对数据进行预处理。
+Dispatcher 可以将 pipeline 执行上下文转发到不同的后续 pipeline 上。
+Transform 决定最终保存在数据库中的数据类型和表结构。
+Table suffix 支持将数据保存到不同的表中。
 
 - Processor 用于对 log 数据进行预处理，例如解析时间字段，替换字段等。
+- Dispatcher(可选) 用于将执行上下文转发到另一个 pipeline，同一个输入批次的数据可以基于特定的值被不同的 pipeline 进行处理。 
 - Transform 用于对数据进行格式转换，例如将字符串类型转换为数字类型。
+- Table suffix（可选） 用于将数据存储到不同的表中，以便后续使用。
 
 一个包含 Processor 和 Transform 的简单配置示例如下：
 
@@ -26,6 +32,14 @@ processors:
         - string_field_b
       method: decode
       ignore_missing: true
+dispatcher:
+  field: type
+  rules:
+    - value: http
+      table_suffix: http
+      pipeline: http
+    - value: db
+      table_suffix: db
 transform:
   - fields:
       - string_field_a
@@ -37,6 +51,7 @@ transform:
     # epoch 是特殊字段类型，必须指定精度
     type: epoch, ms
     index: timestamp
+table_suffix: _${string_field_a}
 ```
 
 ## Processor
@@ -779,3 +794,44 @@ Dispatcher 在 processor 之后执行。当匹配到相应的规则时，下一
 `http` 规则时，最终的表名叫做 `applogs_http`。
 
 如果没有规则匹配到，数据将执行当前 pipeline 中定一个 transform 规则。
+
+## Table suffix
+
+:::warning 实验性特性
+此实验性功能可能存在预期外的行为，其功能未来可能发生变化。
+:::
+
+在一些场景下，你可能需要将写入的日志数据基于输入的字段值保存到不同表上。
+比如你可能希望按照产生的应用名将日志保存到不同的表上，在表名上添加这个应用名的后缀。
+
+一个配置示例如下:
+```yaml
+table_suffix: _${app_name}
+```
+
+语法非常简单： 使用 `${}` 来引用 pipeline 执行上下文中的变量。
+该变量可以是输入数据中直接存在的，也可以是前序处理流程中产生的。
+变量替换完成之后，整个字符串会被添加到输入的表名后。
+
+注意：
+1. 引用的变量必须是一个整数或者字符串类型的数据
+2. 如果在执行过程中遇到任何错误（例如变量不存在或者无效数据类型），输入的表名会被作为最终表名
+
+下面举一个例子。以下是输入数据：
+```JSON
+[
+  {"type": "db"},
+  {"type": "http"},
+  {"t": "test"}
+]
+```
+
+输入的表名为 `persist_app`，pipeline 配置如下：
+```YAML
+table_suffix: _${type}
+```
+
+这三行输入数据会被写入到三张不同的表中：
+1. `persist_app_db`
+2. `persist_app_http`
+3. `persist_app`, 因为输入的数据中并没有 `type` 字段，所以使用了默认的表名