Skip to content

Commit e7e2bd2

Browse files
shuiyisongnicecuifengys1996
authored
docs: table suffix (#1614)
Co-authored-by: Yiran <[email protected]> Co-authored-by: fys <[email protected]>
1 parent 135d799 commit e7e2bd2

File tree

2 files changed

+108
-2
lines changed

2 files changed

+108
-2
lines changed

docs/user-guide/logs/pipeline-config.md

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,16 @@ These configurations are provided in YAML format, allowing the Pipeline to proce
1111

1212
## Overall structure
1313

14-
Pipeline consists of two parts: Processors and Transform, both of which are in array format. A Pipeline configuration can contain multiple Processors and multiple Transforms. The data type described by Transform determines the table structure when storing log data in the database.
14+
Pipeline consists of four parts: Processors, Dispatcher, Transform, and Table suffix.
15+
Processors pre-processes input log data.
16+
Dispatcher forwards pipeline execution context onto different subsequent pipeline.
17+
Transform decides the final datatype and table structure in the database.
18+
Table suffix allows storing the data into different tables.
1519

1620
- Processors are used for preprocessing log data, such as parsing time fields and replacing fields.
21+
- Dispatcher(optional) is used for forwarding the context into another pipeline, so that the same batch of input data can be divided and processed by different pipeline based on certain fields.
1722
- Transform is used for converting data formats, such as converting string types to numeric types.
23+
- Table suffix(optional) is used for storing data into different table for later convenience.
1824

1925
Here is an example of a simple configuration that includes Processors and Transform:
2026

@@ -45,6 +51,7 @@ transform:
4551
# epoch is a special field type and must specify precision
4652
type: epoch, ms
4753
index: timestamp
54+
table_suffix: _${string_field_a}
4855
```
4956
5057
## Processor
@@ -770,3 +777,46 @@ matches the `http` rule, data is stored in `applogs_http`.
770777

771778
If no rules match, data is transformed by the current pipeline's
772779
transformations.
780+
781+
## Table suffix
782+
783+
:::warning Experimental Feature
784+
This experimental feature may contain unexpected behavior, have its functionality change in the future.
785+
:::
786+
787+
There are cases where you want to split and insert log data into different target table
788+
based on some certain values of input data. For example, if you want to divide and store the log data
789+
based on the application where the log is produced, thus adding an app name suffix to the target table.
790+
791+
A sample configuration is like:
792+
```yaml
793+
table_suffix: _${app_name}
794+
```
795+
796+
The syntax is simple: use `${}` to include the variable in the pipeline execution context.
797+
The variable can be directly from the input data or a product of former process.
798+
After the table suffix is formatted, the whole string will be added to the input table name.
799+
800+
Note:
801+
1. The variable must be an integer number or a string type of data.
802+
2. If any error occurs in runtime(e.g: the variable is missing or not a valid type), the input table
803+
name will be used.
804+
805+
Here is an example of how it works. The input data is like following:
806+
```JSON
807+
[
808+
{"type": "db"},
809+
{"type": "http"},
810+
{"t": "test"}
811+
]
812+
```
813+
814+
The input table name is `persist_app`, and the pipeline config is like
815+
```YAML
816+
table_suffix: _${type}
817+
```
818+
819+
These three lines of input log will be inserted into three tables:
820+
1. `persist_app_db`
821+
2. `persist_app_http`
822+
3. `persist_app`, for it doesn't have a `type` field, thus the default table name will be used.

i18n/zh/docusaurus-plugin-content-docs/current/user-guide/logs/pipeline-config.md

Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,16 @@ Pipeline 是 GreptimeDB 中对 log 数据进行解析和转换的一种机制,
1111

1212
## 整体结构
1313

14-
Pipeline 由两部分组成:Processors 和 Transform,这两部分均为数组形式。一个 Pipeline 配置可以包含多个 Processor 和多个 Transform。Transform 所描述的数据类型会决定日志数据保存到数据库时的表结构。
14+
Pipeline 由四部分组成:Processors、Dispatcher、Transform 和 Table suffix。
15+
Processors 对数据进行预处理。
16+
Dispatcher 可以将 pipeline 执行上下文转发到不同的后续 pipeline 上。
17+
Transform 决定最终保存在数据库中的数据类型和表结构。
18+
Table suffix 支持将数据保存到不同的表中。
1519

1620
- Processor 用于对 log 数据进行预处理,例如解析时间字段,替换字段等。
21+
- Dispatcher(可选) 用于将执行上下文转发到另一个 pipeline,同一个输入批次的数据可以基于特定的值被不同的 pipeline 进行处理。
1722
- Transform 用于对数据进行格式转换,例如将字符串类型转换为数字类型。
23+
- Table suffix(可选) 用于将数据存储到不同的表中,以便后续使用。
1824

1925
一个包含 Processor 和 Transform 的简单配置示例如下:
2026

@@ -26,6 +32,14 @@ processors:
2632
- string_field_b
2733
method: decode
2834
ignore_missing: true
35+
dispatcher:
36+
field: type
37+
rules:
38+
- value: http
39+
table_suffix: http
40+
pipeline: http
41+
- value: db
42+
table_suffix: db
2943
transform:
3044
- fields:
3145
- string_field_a
@@ -37,6 +51,7 @@ transform:
3751
# epoch 是特殊字段类型,必须指定精度
3852
type: epoch, ms
3953
index: timestamp
54+
table_suffix: _${string_field_a}
4055
```
4156
4257
## Processor
@@ -779,3 +794,44 @@ Dispatcher 在 processor 之后执行。当匹配到相应的规则时,下一
779794
`http` 规则时,最终的表名叫做 `applogs_http`
780795

781796
如果没有规则匹配到,数据将执行当前 pipeline 中定一个 transform 规则。
797+
798+
## Table suffix
799+
800+
:::warning 实验性特性
801+
此实验性功能可能存在预期外的行为,其功能未来可能发生变化。
802+
:::
803+
804+
在一些场景下,你可能需要将写入的日志数据基于输入的字段值保存到不同表上。
805+
比如你可能希望按照产生的应用名将日志保存到不同的表上,在表名上添加这个应用名的后缀。
806+
807+
一个配置示例如下:
808+
```yaml
809+
table_suffix: _${app_name}
810+
```
811+
812+
语法非常简单: 使用 `${}` 来引用 pipeline 执行上下文中的变量。
813+
该变量可以是输入数据中直接存在的,也可以是前序处理流程中产生的。
814+
变量替换完成之后,整个字符串会被添加到输入的表名后。
815+
816+
注意:
817+
1. 引用的变量必须是一个整数或者字符串类型的数据
818+
2. 如果在执行过程中遇到任何错误(例如变量不存在或者无效数据类型),输入的表名会被作为最终表名
819+
820+
下面举一个例子。以下是输入数据:
821+
```JSON
822+
[
823+
{"type": "db"},
824+
{"type": "http"},
825+
{"t": "test"}
826+
]
827+
```
828+
829+
输入的表名为 `persist_app`,pipeline 配置如下:
830+
```YAML
831+
table_suffix: _${type}
832+
```
833+
834+
这三行输入数据会被写入到三张不同的表中:
835+
1. `persist_app_db`
836+
2. `persist_app_http`
837+
3. `persist_app`, 因为输入的数据中并没有 `type` 字段,所以使用了默认的表名

0 commit comments

Comments
 (0)