diff --git a/docs/user-guide/administration/design-table.md b/docs/user-guide/administration/design-table.md
index eab37ba04..dd6273be9 100644
--- a/docs/user-guide/administration/design-table.md
+++ b/docs/user-guide/administration/design-table.md
@@ -17,222 +17,267 @@ Before proceeding, please review the GreptimeDB [Data Model Documentation](/user
## Basic Concepts
+### Cardinality
+
**Cardinality**: Refers to the number of unique values in a dataset. It can be classified as "high cardinality" or "low cardinality":
-- **Low Cardinality Example**: Order statuses like "Pending Payment/Completed Payment/Shipped/Completed"
- have about 4-5 unique values.
- Days of the week are fixed at 7,
- and the number of cities and provinces is also limited.
-- **High Cardinality Example**: User IDs can range from millions to tens of millions.
- Device IP addresses and product SKUs are other examples of high cardinality data.
+- **Low Cardinality**: Low cardinality columns usually have constant values.
+ The total number of unique values usually no more than 10 thousand.
+ For example, `namespace`, `cluster`, `http_method` are usually low cardinality.
+- **High Cardinality**: High cardinality columns contain a large number of unique values.
+ For example, `trace_id`, `span_id`, `user_id`, `uri`, `ip`, `uuid`, `request_id`, table auto increment id, timestamps are usually high cardinality.
+
+### Column Types
+
+In GreptimeDB, columns are categorized into three semantic types: `Tag`, `Field`, and `Timestamp`.
+The timestamp usually represents the time of data sampling or the occurrence time of logs/events.
+GreptimeDB uses the `TIME INDEX` constraint to identify the `Timestamp` column.
+So the `Timestamp` column is also called the `TIME INDEX` column.
+If you have multiple columns with timestamp data type, you can only define one as `TIME INDEX` and others as `Field` columns.
-## Column Types and Selection
+In GreptimeDB, tag columns are optional. The main purposes of tag columns include:
-In GreptimeDB, columns are categorized into three types: Tag, Field, and Time Index.
-The timestamp usually represents the time of data sampling or the occurrence time of logs/events and is used as the Time Index column.
-GreptimeDB optimizes data organization based on the Time Index to enhance query performance.
+1. Defining the ordering of data in storage.
+ GreptimeDB reuses the `PRIMARY KEY` constraint to define tag columns and the ordering of tags.
+ Unlike traditional databases, GreptimeDB defines time-series by the primary key.
+ Tables in GreptimeDB sort rows in the order of `(primary key, timestamp)`.
+ This improves the locality of data with the same tags.
+ If there are no tag columns, GreptimeDB sorts rows by timestamp.
+2. Identifying a unique time-series.
+ When the table is not append-only, GreptimeDB can deduplicate rows by timestamp under the same time-series (primary key).
+3. Smoothing migration from other TSDBs that use tags or labels.
-### Tag Columns
-Tag columns, also known as label columns,
-generally carry metadata of the measured data or logs/events.
-For example, when collecting nationwide weather temperature data,
-the city (e.g., `city="New York"`) is a typical tag column.
-In monitoring, system metrics like CPU and memory usually involve a `host` tag to represent the hostname.
+## Primary key
-The main purposes of Tag columns in GreptimeDB include:
+### Primary key is optional
-1. Storing low-cardinality metadata.
-2. Filtering data, such as using the `city` column to view the average temperature in New York City over the past week. This is similar to the `WHERE` clause in SQL.
-3. Grouping and aggregating data. For instance, if the temperature data includes a `state` label in addition to `city`, you can group the data by `state` and calculate the average temperature for each `state` over the past week. This is similar to the `GROUP BY` clause in SQL.
+Bad primary key or index may significantly degrade performance.
+Generally you can create an append-only table without a primary key since ordering data by timestamp is sufficient for many workloads.
+It can also serve as a baseline.
-Recommendations for Tag columns:
+```sql
+CREATE TABLE http_logs (
+ access_time TIMESTAMP TIME INDEX,
+ application STRING,
+ remote_addr STRING,
+ http_status STRING,
+ http_method STRING,
+ http_refer STRING,
+ user_agent STRING,
+ request_id STRING,
+ request STRING,
+) with ('append_mode'='true');
+```
-- Typically strings, avoiding `FLOAT` or `DOUBLE`.
-- The number of Tag columns in a table should be moderate, usually not exceeding 20.
-- Control the number of unique values in Tag columns to prevent high cardinality issues, which can negatively impact write performance and lead to index expansion.
-- Ensure that Tag column values remain relatively stable and do not change frequently. For instance, avoid using dynamic values such as serverless container host names as Tag columns.
+The `http_logs` table is an example for storing HTTP server logs.
-### Field Columns
+- The `'append_mode'='true'` option creates the table as an append-only table.
+ This ensures a log doesn't override another one with the same timestamp.
+- The table sorts logs by time so it is efficient to search logs by time.
-Field columns generally carry the actual measured values. For example, the temperature value in weather data should be set as a Field column. In monitoring systems, CPU utilization, memory utilization, etc., are typical Field columns.
-Characteristics of Field columns:
+### When to use primary key
-1. Usually numerical types (integers, floating-point numbers), logs, and event messages are generally strings.
-2. Used for calculations and aggregations.
-3. Can change frequently, meaning they can have any cardinality.
+You can use primary key when there are suitable low cardinality columns and one of the following conditions is met:
-Recommendations for Field columns:
+- Most queries can benefit from the ordering.
+- You need to deduplicate (including delete) rows by the primary key and time index.
-1. Avoid applying filter conditions on Field columns.
-2. Suitable for data that needs to be calculated and aggregated.
-3. Suitable for storing high-frequency changing data.
+For example, if you always only query logs of a specific application, you may set the `application` column as primary key (tag).
-### Tag Columns vs. Field Columns
+```sql
+SELECT message FROM http_logs WHERE application = 'greptimedb' AND access_time > now() - '5 minute'::INTERVAL;
+```
-| | Tag Columns | Field Columns |
-| ----- | ----------- | ------------- |
-| Usage Scenarios | - Data classification and filtering
- Create indexes to speed up queries
- Data grouping and recording contextual metadata | - Store actual measurement values and metrics
- Used for calculations and aggregations
- Target data for analysis|
-| Data Characteristics | - Usually string type
- Relatively stable, low change frequency
- Automatically indexed
- Usually low cardinality
- Indexes occupy additional storage space | - Usually numerical types (integers, floating-point numbers), logs/events may be strings
- High-frequency changes
- Not indexed
- Can be high cardinality
- Relatively low storage overhead |
-| Usage Recommendations | - Used for frequent query filter conditions
- Control cardinality to avoid index expansion
- Choose meaningful classification tags, avoid storing measurement values leading to high cardinality | - Store metrics that need to be calculated and aggregated
- Avoid using as query filter conditions
- Suitable for storing high-frequency changing data
- Used with timestamps for time series analysis |
-| Practical Examples | - Data center: `dc-01`
- Environment: `prod/dev`
- Service name: `api-server`
- Hostname: `host-01`
- City, e.g., `"New York"` | - CPU usage: `75.5`
- Memory usage: `4096MB`
- Request response time: `156ms`
- Temperature: `25.6°C`
- Queue length: `1000`|
+The number of applications is usually limited. Table `http_logs_v2` uses `application` as the primary key.
+It sorts logs by application so querying logs under the same application is faster as it only has to scan a small number of rows. Setting tags may also reduce disk space usage as it improves the locality of data.
-## Timeline
+```sql
+CREATE TABLE http_logs_v2 (
+ access_time TIMESTAMP TIME INDEX,
+ application STRING,
+ remote_addr STRING,
+ http_status STRING,
+ http_method STRING,
+ http_refer STRING,
+ user_agent STRING,
+ request_id STRING,
+ request STRING,
+ PRIMARY KEY(application),
+) with ('append_mode'='true');
+```
-The timeline is crucial in GreptimeDB's data model, closely related to Tag and Field columns, and is the foundation for efficient data storage and querying. A timeline is a collection of data points arranged in chronological order, identified by a unique set of Tags and a Time Index.
-Timelines enable GreptimeDB to efficiently process and store time series data. Unique tag sets can be used to quickly locate and retrieve data within a specific time range, and storage can be optimized to reduce redundancy.
-Understanding the concept of the timeline is key to designing table structures and optimizing query performance. Properly organizing Tag columns, Field columns, and the Time Index can create an efficient data model that meets business needs.
+In order to improve sort and deduplication speed under time-series workloads, GreptimeDB buffers and processes rows by time-series internally.
+So it doesn't need to compare the primary key for each row repeatedly.
+This can be a problem if the tag column has high cardinality:
-## Primary Key and Index
+1. Performance degradation since the database can't batch rows efficiently.
+2. It may increase memory and CPU usage as the database has to maintain the metadata for each time-series.
+3. Deduplication may be too expensive.
-In GreptimeDB, data is organized sequentially based on the `PRIMARY KEY` columns and deduplicated based on the combination of `PRIMARY KEY` and `TIME INDEX` values. GreptimeDB supports data updates by inserting new rows to overwrite existing rows with the same `PRIMARY KEY` and `TIME INDEX` values.
-By default, columns with the `PRIMARY KEY` clause are Tag columns, and columns that are not part of the `PRIMARY KEY` and not the `TIME INDEX` are Field columns. GreptimeDB automatically creates inverted indexes for all Tag columns to enable precise and fast querying and filtering.
+So you must not use high cardinality column as the primary key or put too many columns in the primary key. Currently, the recommended number of values for the primary key is no more than 100 thousand. A long primary key will negatively affect the insert performance and enlarge the memory footprint. A primary key with no more than 5 columns is recommended.
-Example:
+
+Recommendations for tags:
+
+- Low cardinality columns that occur in `WHERE`/`GROUP BY`/`ORDER BY` frequently.
+ These columns usually remain constant.
+ For example, `namespace`, `cluster`, or an AWS `region`.
+- No need to set all low cardinality columns as tags since this may impact the performance of ingestion and querying.
+- Typically use short strings and integers for tags, avoiding `FLOAT`, `DOUBLE`, `TIMESTAMP`.
+- Never set high cardinality columns as tags if they change frequently.
+ For example, `trace_id`, `span_id`, `user_id` must not be used as tags.
+ GreptimeDB works well if you set them as fields instead of tags.
+
+
+## Index
+
+Besides primary key, you can also use index to accelerate specific queries on demand.
+
+### Inverted Index
+
+GreptimeDB supports inverted index that may speed up filtering low cardinality columns.
+When creating a table, you can specify the [inverted index](/contributor-guide/datanode/data-persistence-indexing.md#inverted-index) columns using the `INVERTED INDEX` clause.
+For example, `http_logs_v3` adds an inverted index for the `http_method` column.
```sql
-CREATE TABLE IF NOT EXISTS system_metrics (
- host STRING,
- idc STRING,
- cpu_util DOUBLE,
- memory_util DOUBLE,
- disk_util DOUBLE,
- `load` DOUBLE,
- ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
- PRIMARY KEY(host, idc),
- TIME INDEX(ts)
-);
+CREATE TABLE http_logs_v3 (
+ access_time TIMESTAMP TIME INDEX,
+ application STRING,
+ remote_addr STRING,
+ http_status STRING,
+ http_method STRING INVERTED INDEX,
+ http_refer STRING,
+ user_agent STRING,
+ request_id STRING,
+ request STRING,
+ PRIMARY KEY(application),
+) with ('append_mode'='true');
```
-Here, `host` and `idc` are primary key columns and Tag columns, and `ts` is the `TIME INDEX`. Other fields like `cpu_util` are Field columns.
+The following query can use the inverted index on the `http_method` column.
-However, this design has limitations. Specifically, it does not support deduplication and optimized sorting for certain columns without creating additional indexes, which can lead to data expansion and performance degradation.
+```sql
+SELECT message FROM http_logs_v3 WHERE application = 'greptimedb' AND http_method = `GET` AND access_time > now() - '5 minute'::INTERVAL;
+```
-For instance, in a monitoring scenario involving serverless containers,
-the host names of these short-lived containers can cause high cardinality issues if added to the primary key.
-Despite this, deduplication based on host names is still desired.
-Similarly, in IoT scenarios, there may be tens of thousands of devices.
-Adding their IP addresses to the primary key can also result in high cardinality problems,
-yet deduplication based on IP addresses remains necessary.
+Inverted index supports the following operators:
+- `=`
+- `<`
+- `<=`
+- `>`
+- `>=`
+- `IN`
+- `BETWEEN`
+- `~`
-## Separating Primary Key and Inverted Index
-Therefore, starting from `v0.10`, GreptimeDB supports separating the primary key and index.
-When creating a table, you can specify the [inverted index](/contributor-guide/datanode/data-persistence-indexing.md#inverted-index) columns using the `INVERTED INDEX` clause.
-In this case, the `PRIMARY KEY` will not be automatically indexed but will only be used for deduplication and sorting:
+### Skipping Index
+For high cardinality columns like `trace_id`, `request_id`, using a [skipping index](/user-guide/manage-data/data-index.md#skipping-index) is more appropriate.
+This method has lower storage overhead and resource usage, particularly in terms of memory and disk consumption.
Example:
```sql
-CREATE TABLE IF NOT EXISTS system_metrics (
- host STRING,
- idc STRING INVERTED INDEX,
- cpu_util DOUBLE,
- memory_util DOUBLE,
- disk_util DOUBLE,
- `load` DOUBLE,
- ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
- PRIMARY KEY(host, idc),
- TIME INDEX(ts)
-);
+CREATE TABLE http_logs_v4 (
+ access_time TIMESTAMP TIME INDEX,
+ application STRING,
+ remote_addr STRING,
+ http_status STRING,
+ http_method STRING INVERTED INDEX,
+ http_refer STRING,
+ user_agent STRING,
+ request_id STRING SKIPPING INDEX,
+ request STRING,
+ PRIMARY KEY(application),
+) with ('append_mode'='true');
```
-The `host` and `idc` columns remain as primary key columns and are used in conjunction with `ts` for data deduplication and sorting optimization.
-However, they will no longer be automatically indexed by default.
-By using the `INVERTED INDEX` column constraint,
-an inverted index is created specifically for the `idc` column.
-This approach helps to avoid potential performance and storage issues that could arise from the high cardinality of the `host` column.
+The following query can use the skipping index to filter the `request_id` column.
-## Full-Text Index
+```sql
+SELECT message FROM http_logs_v4 WHERE application = 'greptimedb' AND request_id = `25b6f398-41cf-4965-aa19-e1c63a88a7a9` AND access_time > now() - '5 minute'::INTERVAL;
+```
-For log text-type Field columns that require tokenization and inverted index-based querying, GreptimeDB provides full-text indexing.
+However, note that the query capabilities of the skipping index are generally inferior to those of the inverted index.
+Skipping index can't handle complex filter conditions and may have a lower filtering performance on low cardinality columns. It only supports the equal operator.
-Example:
+
+### Full-Text Index
+
+For unstructured log messages that require tokenization and searching by terms, GreptimeDB provides full-text index.
+
+For example, the `raw_logs` table stores unstructured logs in the `message` field.
```sql
-CREATE TABLE IF NOT EXISTS `logs` (
+CREATE TABLE IF NOT EXISTS `raw_logs` (
message STRING NULL FULLTEXT INDEX WITH(analyzer = 'English', case_sensitive = 'false'),
ts TIMESTAMP(9) NOT NULL,
TIME INDEX (ts),
-);
+) with ('append_mode'='true');
```
The `message` field is full-text indexed using the `FULLTEXT INDEX` option.
See [fulltext column options](/reference/sql/create.md#fulltext-column-option) for more information.
-## Skipping Index
+Storing and querying structured logs usually have better performance than unstructured logs with full-text index.
+It's recommended to [use Pipeline](/user-guide/logs/quick-start.md#create-a-pipeline) to convert logs into structured logs.
-For `trace_id` in link tracking or IP addresses and MAC addresses in server access logs,
-using a [skipping index](/user-guide/manage-data/data-index.md#skipping-index) is more appropriate.
-This method has lower storage overhead and resource usage,
-particularly in terms of memory consumption.
-Example:
+### When to use index
-```sql
-CREATE TABLE sensor_data (
- domain STRING PRIMARY KEY,
- device_id STRING SKIPPING INDEX,
- temperature DOUBLE,
- timestamp TIMESTAMP TIME INDEX,
-);
-```
+Index in GreptimeDB is flexible and powerful.
+You can create an index for any column, no matter if the column is a tag or a field.
+It's meaningless to create additional index for the timestamp column.
+Generally you don't need to create indexes for all columns.
+Maintaining indexes may introduce additional cost and stall ingestion.
+A bad index may occupy too much disk space and make queries slower.
-In this example, the skipping index is applied to the `device_id` column.
-However, note that the query efficiency and capability of the skipping index are generally inferior to those of the full-text index and inverted index.
-## Comparison and Selection of Index Types
+You can use a table without additional index as a baseline.
+There is no need to create an index for the table if the query performance is already acceptable.
+You can create an index for a column when:
-| | Inverted Index | Full-Text Index | Skip Index|
-| ----- | ----------- | ------------- |------------- |
-| Suitable Scenarios | - Data queries based on tag values
- Filtering operations on string columns
- Precise queries on tag columns | - Text content search
- Pattern matching queries
- Large-scale text filtering|- Sparse data distribution scenarios, such as MAC addresses in logs
- Querying infrequent values in large datasets|
-| Creation Method | - Specified using `INVERTED INDEX` |- Specified using `FULLTEXT` in column options | - Specified using `SKIPPING INDEX` in column options |
+- The column occurs in the filter frequently.
+- Filtering the column without an index isn't fast enough.
+- There is a suitable index for the column.
-## High Cardinality Issues
-High cardinality data impacts GreptimeDB by increasing memory usage and reducing compression efficiency. The size of inverted indexes can expand dramatically with increasing cardinality.
+The following table lists the suitable scenarios of all index types.
-To manage high cardinality:
+| | Inverted Index | Full-Text Index | Skip Index|
+| ----- | ----------- | ------------- |------------- |
+| Suitable Scenarios | - Filtering low cardinality columns | - Text content search | - Precise filtering high cardinality columns |
+| Creation Method | - Specified using `INVERTED INDEX` |- Specified using `FULLTEXT INDEX` in column options | - Specified using `SKIPPING INDEX` in column options |
-1. **Modeling Adjustments**: Determine whether a column should be a Tag column.
-2. **Index Optimization**: Assess if a column should be part of the inverted index based on its usage in queries. Remove columns from the inverted index if they are infrequently used as filtering conditions, do not require exact matching, or have extreme selectivity.
-3. **Alternative Indexing**: For columns with high selectivity, consider using a skipping index (`SKIPPING INDEX`) to enhance filtering query performance.
-## Append-Only Tables
+## Deduplication
-If your business data allows duplicates, has minimal updates,
-or can handle deduplication at the application layer,
-consider using append-only tables.
-Append-only tables have higher scan performance because the storage engine can skip merge and deduplication operations.
-Additionally, if the table is an append-only table,
-the query engine can use statistics to speed up certain queries.
-
-Example:
+If deduplication is necessary, you can use the default table options, which sets the `append_mode` to `false` and enables deduplication.
```sql
-CREATE TABLE `origin_logs` (
- `message` STRING FULLTEXT INDEX,
- `time` TIMESTAMP TIME INDEX
-) WITH (
- append_mode = 'true'
+CREATE TABLE IF NOT EXISTS system_metrics (
+ host STRING,
+ cpu_util DOUBLE,
+ memory_util DOUBLE,
+ disk_util DOUBLE,
+ ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+ PRIMARY KEY(host),
+ TIME INDEX(ts)
);
```
-Please refer to the [CREATE statement table options](/reference/sql/create.md#table-options) for more information.
+GreptimeDB deduplicates rows by the same primary key and timestamp if the table isn't append-only.
+For example, the `system_metrics` table removes duplicate rows by `host` and `ts`.
-## Data updating and merging
+### Data updating and merging
-When the values of the `PRIMARY KEY` column and the timestamp `TIME INDEX` column match existing data,
-new data can be inserted to overwrite the existing data.
-By default, if there are multiple Field columns,
-a new value must be provided for each Field column during an update.
-If some values are missing,
-the corresponding Field columns will lose its data after the update.
-This behavior is due to the merge strategy employed by GreptimeDB when encountering multiple rows with the same primary key and time index during a query.
+GreptimeDB supports two different strategies for deduplication: `last_row` and `last_non_null`.
+You can specify the strategy by the `merge_mode` table option.
GreptimeDB uses an LSM Tree-based storage engine,
which does not overwrite old data in place but allows multiple versions of data to coexist.
@@ -247,20 +292,29 @@ requiring all Field values to be provided during updates.
For scenarios where only specific Field values need updating while others remain unchanged,
the `merge_mode` option can be set to `last_non_null`.
-This mode retains the latest value for each field during queries,
+This mode retains the latest non-null value for each field during queries,
allowing updates to provide only the values that need to change.

-In `last_non_null` merge mode,
-the latest value of each field is merged during queries,
-and only the updated values need to be provided.
-
The `last_non_null` merge mode is the default for tables created automatically via the InfluxDB line protocol,
aligning with InfluxDB's update behavior.
+The `last_row` merge mode doesn't have to check each individual field value so it is usually faster than the `last_non_null` mode.
Note that `merge_mode` cannot be set for Append-Only tables, as they do not perform merges.
+### When to use append-only tables
+
+If you don't need the following features, you can use append-only tables:
+
+- Deduplication
+- Deletion
+
+GreptimeDB implements `DELETE` via deduplicating rows so append-only tables don't support deletion now.
+Deduplication requires more computation and restricts the parallelism of ingestion and querying.
+Using append-only tables usually has better query performance.
+
+
## Wide Table vs. Multiple Tables
In monitoring or IoT scenarios, multiple metrics are often collected simultaneously.
@@ -268,8 +322,7 @@ We recommend placing metrics collected simultaneously into a single table to imp

-Although Prometheus traditionally uses multiple tables for storage,
-GreptimeDB's Prometheus Remote Storage protocol supports wide table data sharing at the underlying layer through the [Metric Engine](/contributor-guide/datanode/metric-engine.md).
+Although Prometheus uses single-value model for metrics, GreptimeDB's Prometheus Remote Storage protocol supports sharing a wide table for metrics at the underlying layer through the [Metric Engine](/contributor-guide/datanode/metric-engine.md).
## Distributed Tables
@@ -277,8 +330,7 @@ GreptimeDB supports partitioning data tables to distribute read/write hotspots a
### Two misunderstandings about distributed tables
-As a time-series database, GreptimeDB automatically organizes data based on the TIME INDEX column at the storage layer,
-ensuring physical continuity and order.
+As a time-series database, GreptimeDB automatically partitions data based on the TIME INDEX column at the storage layer.
Therefore, it is unnecessary and not recommended for you to partition data by time
(e.g., one partition per day or one table per week).
@@ -288,17 +340,31 @@ with each partition containing all columns.
### When to Partition and Determining the Number of Partitions
-GreptimeDB releases a [benchmark report](https://github.com/GreptimeTeam/greptimedb/tree/main/docs/benchmarks/tsbs) with each major version update,
-detailing the write efficiency of a single partition.
+A table can utilize all the resources in the machine, especially during query.
+Partitioning a table may not always improve the performance:
+
+- A distributed query plan isn't always as efficient as a local query plan.
+- Distributed query may introduce additional data transmission across the network.
+
+
+There is no need to partition a table unless a single machine isn't enough to serve the table.
+For example:
+
+- There is not enough local disk space to store the data or to cache the data when using object stores.
+- You need more CPU cores to improve the query performance or more memory for costly queries.
+- The disk throughput becomes the bottleneck.
+- The ingestion rate is larger than the throughput of a single node.
+
+GreptimeDB releases a [benchmark report](https://github.com/GreptimeTeam/greptimedb/tree/VAR::greptimedbVersion/docs/benchmarks/tsbs) with each major version update,
+detailing the ingestion throughput of a single partition.
Use this report alongside your target scenario to estimate if the write volume approaches the single partition's limit.
+
To estimate the total number of partitions,
-consider the write volume and reserve an additional 30%-50% capacity
+consider the write throughput and reserve an additional 50% resource of CPU
to ensure query performance and stability. Adjust this ratio as necessary.
-For instance, if the average write volume for a table is 3 million rows per second
-and the single partition write limit is 500,000 rows per second,
-you might plan for peak write volumes of up to 5 million rows per second with low query loads.
-In this case, you would reserve 10-12 partitions.
+You can reserve more CPU cores if there are more queries.
+
### Partitioning Methods
@@ -308,8 +374,8 @@ select partition keys that are evenly distributed, stable, and align with query
Examples include:
-- Partitioning by the prefix or suffix of MAC addresses.
-- Partitioning by data center number.
+- Partitioning by the prefix of a trace id.
+- Partitioning by data center name.
- Partitioning by business name.
The partition key should closely match the query conditions.
@@ -318,4 +384,3 @@ using the data center name as a partition key is appropriate.
If the data distribution is not well understood, perform aggregate queries on existing data to gather relevant information.
For more details, refer to the [table partition guide](/user-guide/administration/manage-data/table-sharding.md#partition).
-
diff --git a/docs/user-guide/concepts/data-model.md b/docs/user-guide/concepts/data-model.md
index 3dde2a538..621c71ca2 100644
--- a/docs/user-guide/concepts/data-model.md
+++ b/docs/user-guide/concepts/data-model.md
@@ -7,31 +7,31 @@ description: Describes the data model of GreptimeDB, focusing on time-series tab
## Model
-GreptimeDB uses the time-series table to guide the organization, compression, and expiration management of data.
+GreptimeDB uses the [time-series](https://en.wikipedia.org/wiki/Time_series) table to guide the organization, compression, and expiration management of data.
The data model is mainly based on the table model in relational databases while considering the characteristics of metrics, logs, and events data.
-All data in GreptimeDB is organized into tables with names. Each data item in a table consists of three types of columns: `Tag`, `Timestamp`, and `Field`.
+All data in GreptimeDB is organized into tables with names. Each data item in a table consists of three semantic types of columns: `Tag`, `Timestamp`, and `Field`.
- Table names are often the same as the indicator names, log source names, or metric names.
-- `Tag` columns store metadata that is commonly queried.
- The values in `Tag` columns are labels attached to the collected sources,
- generally used to describe a particular characteristic of these sources.
- `Tag` columns are indexed, making queries on tags performant.
+- `Tag` columns uniquely identify the time-series.
+ Rows with the same `Tag` values belong to the same time-series.
+ Some TSDBs may also call them labels.
- `Timestamp` is the root of a metrics, logs, and events database.
It represents the date and time when the data was generated.
- Timestamps are indexed, making queries on timestamps performant.
- A table can only have one timestamp column, which is called time index.
+ A table can only have one column with the `Timestamp` semantic type, which is also called the `time index`.
- The other columns are `Field` columns.
Fields contain the data indicators or log contents that are collected.
These fields are generally numerical values or string values,
- but may also be other types of data, such as geographic locations.
- Fields are not indexed by default,
- and queries on field values scan all data in the table. It can be resource-intensive and underperformant.
- However, the string field can turn on the [full-text index](/user-guide/logs/query-logs.md#full-text-index-for-accelerated-search) to speed up queries such as log searching.
+ but may also be other types of data, such as geographic locations or timestamps.
-### Metric Table
+A table clusters rows of the same time-series and sorts rows of the same time-series by `Timestamp`.
+The table can also deduplicate rows with the same `Tag` and `Timestamp` values, depending on the requirements of the application.
+GreptimeDB stores and processes data by time-series.
+Choosing the right schema is crucial for efficient data storage and retrieval; please refer to the [schema design guide](/user-guide/administration/design-table.md) for more details.
-Suppose we have a time-series table called `system_metrics` that monitors the resource usage of a standalone device:
+### Metrics
+
+Suppose we have a table called `system_metrics` that monitors the resource usage of machines in data centers:
```sql
CREATE TABLE IF NOT EXISTS system_metrics (
@@ -50,21 +50,21 @@ The data model for this table is as follows:

-Those are very similar to the table model everyone is familiar with. The difference lies in the `Timestamp` constraint, which is used to specify the `ts` column as the time index column of this table.
+This is very similar to the table model everyone is familiar with. The difference lies in the `TIME INDEX` constraint, which is used to specify the `ts` column as the time index column of this table.
- The table name here is `system_metrics`.
-- For `Tag` columns, the `host` column represents the hostname of the collected standalone machine,
- while the `idc` column shows the data center where the machine is located.
- These are queried metadata and can be effectively used to filter data when querying.
+- The `PRIMARY KEY` constraint specifies the `Tag` columns of the table.
+ The `host` column represents the hostname of the collected standalone machine.
+ The `idc` column shows the data center where the machine is located.
- The `Timestamp` column `ts` represents the time when the data is collected.
- It can be effectively used when querying data with a time range.
- The `cpu_util`, `memory_util`, `disk_util`, and `load` columns in the `Field` columns represent
the CPU utilization, memory utilization, disk utilization, and load of the machine, respectively.
- These columns contain the actual data and are not indexed, but they can be efficiently computed and evaluated, such as the latest value, maximum/minimum value, average, percentage, and so on. Please avoid using `Field` columns in query conditions,
- which is highly resource-intensive and underperformant.
+ These columns contain the actual data.
+- The table sorts and deduplicates rows by `host`, `idc`, `ts`. So `select count(*) from system_metrics` will scan all rows.
+
+### Events
-### Log Table
-Another example is creating a log table for access logs:
+Another example is creating a table for events like access logs:
```sql
CREATE TABLE access_logs (
@@ -74,31 +74,35 @@ CREATE TABLE access_logs (
http_method STRING,
http_refer STRING,
user_agent STRING,
- request STRING FULLTEXT INDEX,
- PRIMARY KEY (http_status, http_method)
+ request STRING,
) with ('append_mode'='true');
```
- The time index column is `access_time`.
-- `http_status`, `http_method` are tags.
-- `remote_addr`, `http_refer`, `user_agent` and `request` are fields. `request` is a field that enables full-text index by the [`FULLTEXT INDEX` column option](/reference/sql/create.md#fulltext-column-option).
-- The table is an [append-only table](/reference/sql/create.md#create-an-append-only-table) for storing logs that may have duplicate timestamps under the same primary key.
+- There are no tags.
+- `http_status`, `http_method`, `remote_addr`, `http_refer`, `user_agent` and `request` are fields.
+- The table sorts rows by `access_time`.
+- The table is an [append-only table](/reference/sql/create.md#create-an-append-only-table) for storing logs that do not support deletion or deduplication.
+- Querying an append-only table is usually faster. For example, `select count(*) from access_logs` can use the statistics for result without considering deduplication.
-To learn how to indicate `Tag`, `Timestamp`, and `Field` columns, Please refer to [table management](/user-guide/administration/manage-data/basic-table-operations.md#create-a-table) and [CREATE statement](/reference/sql/create.md).
-Of course, you can place metrics and logs in a single table at any time, which is also a key capability provided by GreptimeDB.
+To learn how to indicate `Tag`, `Timestamp`, and `Field` columns, please refer to [table management](/user-guide/administration/manage-data/basic-table-operations.md#create-a-table) and [CREATE statement](/reference/sql/create.md).
+
## Design Considerations
-GreptimeDB is designed on top of Table for the following reasons:
+GreptimeDB is designed on top of the table model for the following reasons:
-- The Table model has a broad group of users and it's easy to learn, that we just introduced the concept of time index to the metrics, logs, and events.
-- Schema is meta-data to describe data characteristics, and it's more convenient for users to manage and maintain. By introducing the concept of schema version, we can better manage data compatibility.
-- Schema brings enormous benefits for optimizing storage and computing with its information like types, lengths, etc., on which we could conduct targeted optimizations.
-- When we have the Table model, it's natural for us to introduce SQL and use it to process association analysis and aggregation queries between various tables, offsetting the learning and use costs for users.
-- Use a multi-value model where a row of data can have multiple field columns,
+- The table model has a broad group of users and it's easy to learn; we have simply introduced the concept of time index to metrics, logs, and events.
+- Schema is metadata that describes data characteristics, and it's more convenient for users to manage and maintain.
+- Schema brings enormous benefits for optimizing storage and computing with its information like types, lengths, etc., on which we can conduct targeted optimizations.
+- When we have the table model, it's natural for us to introduce SQL and use it to process association analysis and aggregation queries between various tables, reducing the learning and usage costs for users.
+- We use a multi-value model where a row of data can have multiple field columns,
instead of the single-value model adopted by OpenTSDB and Prometheus.
- The multi-value model is used to model data sources, where a metric can have multiple values represented by fields.
- The advantage of the multi-value model is that it can write or read multiple values to the database at once, reducing transfer traffic and simplifying queries. In contrast, the single-value model requires splitting the data into multiple records. Read the [blog](https://greptime.com/blogs/2024-05-09-prometheus) for more detailed benefits of multi-value mode.
+ The multi-value model is used to model data sources where a metric can have multiple values represented by fields.
+ The advantage of the multi-value model is that it can write or read multiple values to the database at once, reducing transfer traffic and simplifying queries. In contrast, the single-value model requires splitting the data into multiple records. Read the [blog](https://greptime.com/blogs/2024-05-09-prometheus) for more detailed benefits of the multi-value mode.
+
-GreptimeDB uses SQL to manage table schema. Please refer to [table management](/user-guide/administration/manage-data/basic-table-operations.md) for more information. However, our definition of schema is not mandatory and leans towards a **schemaless** approach, similar to MongoDB. For more details, see [Automatic Schema Generation](/user-guide/ingest-data/overview.md#automatic-schema-generation).
+GreptimeDB uses SQL to manage table schema. Please refer to [table management](/user-guide/administration/manage-data/basic-table-operations.md) for more information.
+However, our definition of schema is not mandatory and leans towards a **schemaless** approach, similar to MongoDB.
+For more details, see [Automatic Schema Generation](/user-guide/ingest-data/overview.md#automatic-schema-generation).
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/administration/design-table.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/administration/design-table.md
index 30c80f3fa..8d36b9474 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/administration/design-table.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/administration/design-table.md
@@ -6,7 +6,7 @@ description: 详细介绍了 GreptimeDB 的数据模型使用指南,以及常
# 数据建模指南
表结构设计将极大影响写入和查询性能。
-在写入数据之前,你需要了解业务中涉及到的数据类型、数量规模以及常用查询,
+在写入数据之前,你需要了解业务中涉及到的数据类型、数据规模以及常用查询,
并根据这些数据特征进行数据建模。
本文档将详细介绍 GreptimeDB 的数据模型使用指南,以及常见场景的表结构设计方式。
@@ -16,284 +16,351 @@ description: 详细介绍了 GreptimeDB 的数据模型使用指南,以及常
## 基本概念
-基数(Cardinality):是指数据集中唯一值的数量。我们可以通过"高基数"和"低基数"来分类:
+**基数(Cardinality)**:指数据集中唯一值的数量。可以分为"高基数"和"低基数":
-- 低基数(Low Cardinality)示例:订单状态包括 "待付款/已付款/已发货/已完成"等,约 4~5 个唯一的值,星期几固定是 7 个,城市和省份的数量也是有限。
-- 高基数(High Cardinality)示例:用户 ID 是典型的,比如可能是百万到千万的用户数量;设备 IP 地址,也可能是数百万个;商品 SKU 也是一个典型的高基数数据。
+- **低基数(Low Cardinality)**:低基数列通常具有固定值。
+ 唯一值的总数通常不超过1万个。
+ 例如,`namespace`、`cluster`、`http_method` 通常是低基数的。
+- **高基数(High Cardinality)**:高基数列包含大量的唯一值。
+ 例如,`trace_id`、`span_id`、`user_id`、`uri`、`ip`、`uuid`、`request_id`、表的自增 ID,时间戳通常是高基数的。
-## 列的类型及选择
-GreptimeDB 中列区分为三种类型:Tag、Field 和 Time Index。
-时间戳不用做过多讨论,一般是数据采样的时间或者日志、事件发生的时间作为 Time Index 列。
-GreptimeDB 也会按照 Time Index 来优化数据组织,提升查询性能。
+## 列类型
-我们重点讨论 Tag 和 Field,以及如何为列选择正确的类型。
+在 GreptimeDB 中,列被分为三种语义类型:`Tag`、`Field` 和 `Timestamp`。
+时间戳通常表示数据采样的时间或日志/事件发生的时间。
+GreptimeDB 使用 `TIME INDEX` 约束来标识 `Timestamp` 列。
+因此,`Timestamp` 列也被称为 `TIME INDEX` 列。
+如果你有多个时间戳数据类型的列,你只能将其中一个定义为 `TIME INDEX`,其他的定义为 `Field` 列。
-### Tag 列
+在 GreptimeDB 中,Tag 列是可选的。Tag 列的主要用途包括:
-Tag 列,也称为标签列(Label),一般来说是携带了度量数据或者日志、事件的元信息。
-举例来说,采集全国的气象温度数据,那么城市(city)就是一个典型的标签列,
-比如 `city="New York"`;监控中,采集系统的 CPU、内存等指标,
-通常会有 `host` 标签来表示主机名。
-Kubernetes 的 `pod` 容器就带有大量的 label。
+1. 定义存储时数据的排序方式。
+ GreptimeDB 使用 `PRIMARY KEY` 约束来定义 tag 列和 tag 的排序顺序。
+ 与传统数据库不同,GreptimeDB 的主键是用来定义时间序列的。
+ GreptimeDB 中的表按照 `(primary key, timestamp)` 的顺序对行进行排序。
+ 这提高了具有相同 tag 数据的局部性。
+ 如果没有定义 tag 列,GreptimeDB 按时间戳对行进行排序。
+2. 唯一标识一个时间序列。
+ 当表不是 append-only 模式时,GreptimeDB 根据时间戳在同一时间序列(主键)下去除重复行。
+3. 便于从使用 label 或 tag 的其他时序数据库迁移。
-Tag 列在 GreptimeDB 中的主要用途包括:
-1. 存储低基数(low-cardinality)的元信息
-2. 用于数据的过滤,例如去查看纽约市过去一周的平均气温,城市 `city` 就作为一个过滤条件来使用,相当于 SQL 中出现在 `WHERE` 中的条件
-3. 用于数据的分组和聚合,例如,假设气温的数据,除了 `city` 之外我们还有个省份标签 `state`,那我们可以按照省份来分组数据,并聚合计算一个省份过去一周的平均气温,相当于 SQL 中的 `GROUP BY` 字段
+## 主键
-GreptimeDB 中将加入 `PRIMARY KEY` 的列都认为是 Tag 列,并默认将为这些列建立倒排索引(指定 `INVERTED INDEX` 约束会带来一些变化,我们将在索引一节展开)。
-我们建议:
+### 主键是可选的
-- Tag 列的值类型通常使用字符串,避免使用 `FLOAT` 或 `DOUBLE`
-- 一张表中 Tag 列的数量控制在一个适中的范围内,通常不超过 20 个
-- Tag 列中的唯一值数量控制在一个适中的范围内,避免高基数问题,高基数会影响写入性能并导致索引膨胀
-- Tag 列的值不会频繁变化,一个错误范例就是将 serverless 容器的主机名作为 tag 列
+错误的主键或索引可能会显著降低性能。
+通常,你可以创建一个没有主键的仅追加表,因为对于许多场景来说,按时间戳排序数据已经有不错测性能了。
+这也可以作为性能基准。
-### Field 列
+```sql
+CREATE TABLE http_logs (
+ access_time TIMESTAMP TIME INDEX,
+ application STRING,
+ remote_addr STRING,
+ http_status STRING,
+ http_method STRING,
+ http_refer STRING,
+ user_agent STRING,
+ request_id STRING,
+ request STRING,
+) with ('append_mode'='true');
+```
-Field 列,一般来说就是携带了度量的实际值,仍然以气温数据为例,温度这个度量的值通常都应该设置为 Field 列。监控系统中的 CPU 利用率、内存利用率、磁盘利用率等,也是典型的 Field 列。
+`http_logs` 表是存储 HTTP 服务器日志的示例。
-它的数据特点:
-1. 通常是数值类型(整数、浮点数),日志和事件消息一般是字符串
-2. 用于计算和聚合,比如求平均值,最大值,P99 等
-3. 可以高频率变化,也就是可以是任意基数(cardinality)的
+- `'append_mode'='true'` 选项将表创建为仅追加表。
+ 这确保一个日志不会覆盖另一个具有相同时间戳的日志。
+- 该表按时间对日志进行排序,因此按时间搜索日志效率很高。
-GreptimeDB 中不在 `PRIMARY KEY` 的非 `TIME INDEX` 列就是 Field 列,GreptimeDB 不会为这些列建索引。
-使用上我们建议:
-1. 避免将过滤条件作用在 Field 中
-2. 适合需要做计算和聚合的数据
-3. 适合存储高频变化也就是高基数的数据
+### 何时使用主键
-### Tag 列 vs. Field 列
+当有适合的低基数列且满足以下条件之一时,可以使用主键:
-| | Tag 列 | Field 列 |
-| ----- | ----------- | ------------- |
-| 主要用途 | - 用于数据分类和筛选
- 建立索引加速查询
- 数据分组和上下文元信息记录 | - 存储实际的测量值和指标
- 用于计算和聚合
- 作为分析的目标数据|
-| 数据特点 | - 通常为字符串类型
- 相对稳定,变化频率低
- 自动建索引
- 通常是低基数
- 索引会占用额外存储空间 |- 通常为数值类型(整数、浮点数),日志事件可能是字符串
- 高频变化
- 不建索引
- 可以是高基数
- 存储开销相对较小 |
-| 使用建议 | - 用于频繁的查询过滤条件
- 控制基数以避免索引膨胀
- 选择有意义的分类标签,避免存储度量值导致高基数 | - 存储需要计算和聚合的指标
- 避免用作查询过滤条件
- 适合存储高频变化的数据
- 配合时间戳使用做时序分析 |
-| 实际例子 | - 机房:`dc-01`
- 环境:`prod/dev`
- 服务名:`api-server`
- 主机名:`host-01`
- 城市,例如 `"New York"` | - CPU 使用率:`75.5`
- 内存使用量:`4096MB`
- 请求响应时间:`156ms`
- 温度:`25.6°C`
- 队列长度:`1000`|
+- 大多数查询可以从排序中受益。
+- 你需要通过主键和时间索引对行进行去重(包括删除)。
-## 时间线
+例如,如果你总是只查询特定应用程序的日志,可以将 `application` 列设为主键(tag)。
-介绍完 Tag 和 Field 列后,我们将引入时间线概念。
+```sql
+SELECT message FROM http_logs WHERE application = 'greptimedb' AND access_time > now() - '5 minute'::INTERVAL;
+```
-时间线在 GreptimeDB 数据模型中至关重要,与 Tag 列和 Field 列紧密相关,是高效存储和查询数据的基础。
-时间线是按时间顺序排列的数据点集合,
-由唯一的 Tag 集合和 Time Index 标识。
-如采集全国气象温度数据,`city = New York` 且 `state = New York State` 的每天温度数据构成一条时间线,每个数据点对应时间戳和温度值。
+应用程序的数量通常是有限的。表 `http_logs_v2` 使用 `application` 作为主键。
+它按应用程序对日志进行排序,因此查询同一应用程序下的日志速度更快,因为它只需要扫描少量行。设置 tag 还能减少磁盘空间,因为它提高了数据的局部性,对压缩更友好。
-时间线使 GreptimeDB 能高效处理和存储时间序列数据,通过唯一 Tag 集合可快速定位检索特定时间范围数据,还能优化存储减少冗余。
-在实际应用中,理解时间线概念对设计表结构和优化查询性能关键,
-不同时间线特性不同,可据此优化表结构和查询策略,
-合理组织 Tag 列、Field 列和 Time Index 能构建高效数据模型满足业务需求。
-总之,时间线是 GreptimeDB 数据模型的桥梁,理解运用其概念有助于数据建模和处理。
+```sql
+CREATE TABLE http_logs_v2 (
+ access_time TIMESTAMP TIME INDEX,
+ application STRING,
+ remote_addr STRING,
+ http_status STRING,
+ http_method STRING,
+ http_refer STRING,
+ user_agent STRING,
+ request_id STRING,
+ request STRING,
+ PRIMARY KEY(application),
+) with ('append_mode'='true');
+```
-## 主键和索引
+为了提高时序场景下的排序和去重速度,GreptimeDB 内部按时间序列缓冲和处理行。
+因此,它不需要反复比较每行的主键。
+如果 tag 列具有高基数,这可能会成为问题:
-在 GreptimeDB 中,数据依照主键列 `PRIMARY KEY` 进行顺序组织,
-并基于 `PRIMARY KEY` 和 `TIME INDEX` 的值的组合(也就是时间线)来执行去重操作。
-GreptimeDB 中对数据更新的支持是通过插入覆盖具有相同 `PRIMARY KEY` 和 `TIME INDEX` 值的行来达成的。
-你能够更新 Field 列的值,但无法更改主键列和 `TIME INDEX` 的值,不过可以将其删除。
+1. 由于数据库无法有效地批处理行,性能可能会降低。
+2. 由于数据库必须为每个时间序列维护元数据,可能会增加内存和 CPU 使用率。
+3. 去重可能会变得过于昂贵。
-默认情况下,在建表时候加入 `PRIMARY KEY` 约束的列将被视为 Tag 列,
-没有加入的非 `TIME INDEX` 列即为 Field 列。
-并且默认情况下,GreptimeDB 会为所有 Tag 列建立倒排索引,用于精确和快速的查询和过滤。
-例如:
+因此,不能将高基数列作为主键,或在主键中放入过多列。目前,主键值的推荐数量不超过 10 万。过长的主键将对插入性能产生负面影响并增加内存占用。主键最好不超过 5 个列。
+
+选取 tag 列的建议:
+
+- 在 `WHERE`/`GROUP BY`/`ORDER BY` 中频繁出现的低基数列。
+ 这些列通常很少变化。
+ 例如,`namespace`、`cluster` 或 AWS `region`。
+- 无需将所有低基数列设为 tag,因为这可能影响写入和查询性能。
+- 通常对 tag 使用短字符串和整数,避免 `FLOAT`、`DOUBLE`、`TIMESTAMP`。
+- 如果高基数列经常变化,切勿将其设为 tag。
+例如,`trace_id`、`span_id`、`user_id` 绝不能用作 tag。
+如果将它们设为 field 而非 tag,GreptimeDB 可以有很不错的性能。
+
+
+## 索引
+
+除了主键外,你还可以使用倒排索引按需加速特定查询。
+
+GreptimeDB 支持倒排索引,可以加速对低基数列的过滤。
+创建表时,可以使用 `INVERTED INDEX` 子句指定[倒排索引](/contributor-guide/datanode/data-persistence-indexing.md#倒排索引)列。
+例如,`http_logs_v3` 为 `http_method` 列添加了倒排索引。
```sql
-CREATE TABLE IF NOT EXISTS system_metrics (
- host STRING,
- idc STRING,
- cpu_util DOUBLE,
- memory_util DOUBLE,
- disk_util DOUBLE,
- `load` DOUBLE,
- ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
- PRIMARY KEY(host, idc),
- TIME INDEX(ts)
-);
+CREATE TABLE http_logs_v3 (
+ access_time TIMESTAMP TIME INDEX,
+ application STRING,
+ remote_addr STRING,
+ http_status STRING,
+ http_method STRING INVERTED INDEX,
+ http_refer STRING,
+ user_agent STRING,
+ request_id STRING,
+ request STRING,
+ PRIMARY KEY(application),
+) with ('append_mode'='true');
```
+以下查询可以使用 `http_method` 列上的倒排索引。
-这里 `host` 和 `idc` 同时是主键列和 Tag 列,ts 为 `TIME INDEX`,其他字段如 `cpu_util` 等都是 Field 列。
+```sql
+SELECT message FROM http_logs_v3 WHERE application = 'greptimedb' AND http_method = `GET` AND access_time > now() - '5 minute'::INTERVAL;
+```
-
+倒排索引支持以下运算符:
+- `=`
+- `<`
+- `<=`
+- `>`
+- `>=`
+- `IN`
+- `BETWEEN`
+- `~`
-但是这样的设计下,无法实现以下效果:我想对某些列的数据做去重和排序优化,但是不想为这些列建立额外索引导致数据膨胀和性能下降。
-举例来说,监控场景里的 Serverless 容器都是短生命周期的,如果将这些容器的主机名加入主键,
-很可能导致高基数问题,但是因为采集链路或者网络等问题,
-可能数据延迟,我们还是想基于主机名来做数据的去重,
-这就无法兼得。
-在 IoT 场景也有类似的问题,
-IoT 设备可能成千上万,如果将他们的 ip 地址加入主键,
-也会导致高基数问题,但是我们又希望按照 ip 来做数据的去重。
+### 跳数索引
-## 主键和倒排索引分离
+对于高基数列如 `trace_id`、`request_id`,使用[跳数索引](/user-guide/manage-data/data-index.md#跳数索引)更为合适。
+这种方法的存储开销更低,资源使用量更少,特别是在内存和磁盘消耗方面。
-因此,从 `v0.10` 开始,GreptimeDB 支持将主键和索引分离,创建表的时候可以通过 `INVERTED INDEX` 指定表的[倒排索引](/contributor-guide/datanode/data-persistence-indexing.md#倒排索引)列。对于每一个指定的列,GreptimeDB 会创建倒排索引以加速查询,这种情况下 `PRIMARY KEY` 将不会自动创建索引,而仅是用于去重和排序:
+示例:
-我们改进前面的例子:
+```sql
+CREATE TABLE http_logs_v4 (
+ access_time TIMESTAMP TIME INDEX,
+ application STRING,
+ remote_addr STRING,
+ http_status STRING,
+ http_method STRING INVERTED INDEX,
+ http_refer STRING,
+ user_agent STRING,
+ request_id STRING SKIPPING INDEX,
+ request STRING,
+ PRIMARY KEY(application),
+) with ('append_mode'='true');
+```
+
+以下查询可以使用跳数索引过滤 `request_id` 列。
```sql
-CREATE TABLE IF NOT EXISTS system_metrics (
- host STRING,
- idc STRING INVERTED INDEX,
- cpu_util DOUBLE,
- memory_util DOUBLE,
- disk_util DOUBLE,
- `load` DOUBLE,
- ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
- PRIMARY KEY(host, idc),
- TIME INDEX(ts)
-);
+SELECT message FROM http_logs_v4 WHERE application = 'greptimedb' AND request_id = `25b6f398-41cf-4965-aa19-e1c63a88a7a9` AND access_time > now() - '5 minute'::INTERVAL;
```
-`host` 和 `idc` 列仍然是主键列,结合 `ts` 一起做数据去重和排序优化,但是将默认不再自动为它们建立索引。我们通过 `INVERTED INDEX` 列约束为 `idc` 列建立倒排索引。这样就避免了 `host` 列的高基数可能导致的性能和存储瓶颈。
+然而,请注意跳数索引的查询功能通常不如倒排索引丰富。
+跳数索引无法处理复杂的过滤条件,在低基数列上可能有较低的过滤性能。它只支持等于运算符。
-## 全文索引
+### 全文索引
-对于日志文本类的 Field 字段,如果需要分词结合倒排索引来查询,GreptimeDB 也提供了全文索引功能,例如:
+对于需要分词和按术语搜索的非结构化日志消息,GreptimeDB 提供了全文索引。
+
+例如,`raw_logs` 表在 `message` 字段中存储非结构化日志。
```sql
-Create Table: CREATE TABLE IF NOT EXISTS `logs` (
+CREATE TABLE IF NOT EXISTS `raw_logs` (
message STRING NULL FULLTEXT INDEX WITH(analyzer = 'English', case_sensitive = 'false'),
ts TIMESTAMP(9) NOT NULL,
TIME INDEX (ts),
-)
+) with ('append_mode'='true');
```
-这里的 `message` 字段就通过 `FULLTEXT INDEX` 选项设置了全文索引。详见 [fulltext 列选项](/reference/sql/create.md#fulltext-列选项)。
+`message` 字段使用 `FULLTEXT INDEX` 选项进行全文索引。
+更多信息请参见[fulltext 列选项](/reference/sql/create.md#fulltext-列选项)。
-## 跳数索引
+存储和查询结构化日志通常比带有全文索引的非结构化日志性能更好。
+建议[使用 Pipeline](/user-guide/logs/quick-start.md#创建-pipeline) 将日志转换为结构化日志。
-对于类似链路追踪里的 `trace_id` 或者服务器访问日志中的 IP 地址、Mac 地址等,[跳数索引](/user-guide/manage-data/data-index.md#跳数索引)是更加合适的索引方式,它的存储开销更小,资源占用尤其是内存消耗更低:
-```sql
-CREATE TABLE sensor_data (
- domain STRING PRIMARY KEY,
- device_id STRING SKIPPING INDEX,
- temperature DOUBLE,
- timestamp TIMESTAMP TIME INDEX,
-);
-```
+### 何时使用索引
-我们这里将 `device_id` 设置为了跳数索引。不过,跳数索引的查询效率和能力,都会逊色于全文索引和倒排索引。
+GreptimeDB 中的索引十分灵活。
+你可以为任何列创建索引,无论该列是 tag 还是 field。
+为 time index 列创建额外索引没有意义。
+另外,你一般不需要为所有列创建索引。
+因为维护索引可能引入额外成本并阻塞 ingestion。
+不良的索引可能会占用过多磁盘空间并使查询变慢。
-## 索引类型对比和选择
+你可以用没有额外索引的表作为 baseline。
+如果查询性能已经可接受,就无需为表创建索引。
+在以下情况可以为列创建索引:
-| | 倒排索引 | 全文索引 | 跳数索引|
-| ----- | ----------- | ------------- |------------- |
-| 适用场景 | - 基于标签值的数据查询
- 字符串列的过滤操作
- 标签列的精确查询 | - 文本内容搜索
- 模式匹配查询
- 大规模文本过滤|- 数据分布稀疏的场景,例如日志中的 MAC 地址
- 在大规模数据集中查询出现频率较低的值|
-| 创建方式 | - 通过 `INVERTED INDEX` 指定 |- 在列选项中指定 `FULLTEXT` | - 在列选项中指定 `SKIPPING INDEX` |
+- 该列在过滤条件中频繁出现。
+- 没有索引的情况下过滤该列不够快。
+- 该列有合适的索引类型。
+下表列出了所有索引类型的适用场景。
-## 高基数问题
-
-因为 GreptimeDB 内部大多数操作都是围绕“时间线”这一概念来组织的,因此需要避免时间线过度地膨胀。高基数数据对 GreptimeDB 的主要影响有两个方面:
-
-- 维护大量时间线导致内存用量增加,同时压缩效率降低。
-- 倒排索引的体积会随着基数增大而剧烈膨胀。
-
-在基数数量过高时,应首先逐一检查作为 Tag 的每一个列是否需要表达“实体”或“去重”的概念,
-即该列是否有必要作为时间线标识的一部分,尝试从建模层面降低基数。
-此外,还应根据查询方式来判断某一列是否应该作为倒排索引的一部分,
-如果该列不经常作为过滤条件、不需要精确匹配或是选择度过高或过低,都应该从倒排索引中去除。
-对于某些选择度过高的列,可以考虑使用跳数索引 SKIPPING INDEX 来加速过滤查询。
+| | 倒排索引 | 全文索引 | 跳数索引|
+| ----- | ----------- | ------------- |------------- |
+| 适用场景 | - 过滤低基数列 | - 文本内容搜索 | - 精确过滤高基数列 |
+| 创建方法 | - 使用 `INVERTED INDEX` 指定 |- 在列选项中使用 `FULLTEXT INDEX` 指定 | - 在列选项中使用 `SKIPPING INDEX` 指定 |
-## Append-Only 表
-如果业务数据容许重复,几乎没有更新的情况,
-或者可以通过上层应用来去重,
-我们会推荐使用 append-only 表。
-一般来说,append-only 表具有更高的扫描性能,
-因为存储引擎可以跳过合并和去重操作。
-此外,如果表是 append-only 表,查询引擎可以使用统计信息来加速某些查询。
-典型的,比如日志的表,通过 pipeline 写入的自动创建的表,默认都会设置为 append-only 表。
+## 去重
-例如我们创建如下日志表:
+如果需要去重,可以使用默认表选项,相当于将 `append_mode` 设为 `false` 并启用去重功能。
```sql
-CREATE TABLE `origin_logs` (
- `message` STRING FULLTEXT INDEX,
- `time` TIMESTAMP TIME INDEX
-) WITH (
- append_mode = 'true'
+CREATE TABLE IF NOT EXISTS system_metrics (
+ host STRING,
+ cpu_util DOUBLE,
+ memory_util DOUBLE,
+ disk_util DOUBLE,
+ ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+ PRIMARY KEY(host),
+ TIME INDEX(ts)
);
```
-设置了 `append_mode = 'true'` 表选项。更多信息请参考 [CREATE 语句建表选项](/reference/sql/create.md#表选项)。
+当表不是 append-only 的时候,GreptimeDB 会通过相同的主键和时间戳对行进行去重。
+例如,`system_metrics` 表通过 `host` 和 `ts` 删除重复行。
-## 更新和数据合并
+### 数据更新和合并
-前文指出,在 `PRIMARY KEY` 主键列和时间戳 `TIME INDEX` 列的值相同的情形下,
-可通过插入一条新的数据以覆盖已存在的数据。
-倘若存在多个 Field 列,
-默认状况下,对于每个 Field 列均需提供新值(或原有值),
-不可缺失,否则该行数据在更新后,
-未提供值的 Field 列将会丢失。
+GreptimeDB 支持两种不同的去重策略:`last_row` 和 `last_non_null`。
+你可以通过 `merge_mode` 表选项来指定策略。
-这实际上涉及到 GreptimeDB 在进行查询时,
-当遇到多条具有相同主键列和时间索引列的情况所采用的合并策略。
-鉴于 GreptimeDB 采用的是基于 LSM Tree 的存储引擎,
-插入新行时,并不会在原位置覆盖旧数据,而是允许多条数据同时存在,随
-后在查询过程中进行合并。默认的合并行为是 last_row,即以后插入的(row)为准。
+GreptimeDB 使用基于 LSM-Tree 的存储引擎,
+不会就地覆盖旧数据,而是允许多个版本的数据共存。
+这些版本在查询时再合并。
+默认的合并行为是 `last_row`,意味着只保留最近插入的行。

-`last_row` 合并模式:相同主键和时间值的情况下,查询的时候返回最后一次更新的数据,更新需要提供每个 Field 值。
+在 `last_row` 合并模式下,
+对于相同主键和时间值,只返回最新的行,
+所以更新一行时需要提供所有 field 的值。
-但是很多情况下,你可能只是想更新其中一个或者数个 Field 值,其他 Field 值保持不变,
-这种情况下,你可以将表的 `merge_mode` 选项设置为 `last_non_null`,该模式下,查询的时候合并策略将是保留每个字段的最新值:
+对于只需要更新特定 field 值而其他 field 保持不变的场景,
+可以将 `merge_mode` 选项设为 `last_non_null`。
+该模式在查询时会选取每个字段的最新的非空值,
+这样可以在更新时只提供需要更改的 field 的值。

-`last_non_null` 合并模式:相同主键和时间值的情况下,查询的时候合并每个字段的最新值,更新的时候仅提供要更新的值。
+为了与 InfluxDB 的更新行为一致,通过 InfluxDB line protocol 自动创建的表默认启用 `last_non_null` 合并模式。
+
+`last_row` 合并模式不需要检查每个单独的字段值,因此通常比 `last_non_null` 模式更快。
+请注意,append-only 表不能设置 `merge_mode`,因为它们不执行合并。
+
+### 何时使用 append-only 表
-`'merge_mode'='last_non_null'` 默认也是通过 InfluxDB 行协议写入的自动创建表的默认模式,跟 InfluxDB 的更新行为保持一致。
+如果不需要以下功能,可以使用 append-only 表:
-请注意,Append-Only 的表是无法设置 `merge_mode` 的,因为它不会进行合并行为。
+- 去重
+- 删除
-## 宽表 vs.多表
+GreptimeDB 通过去重功能实现 `DELETE`,因此 append-only 表目前不支持删除。
+去重需要更多的计算并限制了写入和查询的并行性,所以使用 append-only 表通常具有更好的性能。
-表的模型这块,还涉及宽表或者多表模式,通常来说,在监控或者 IoT 场景,一次采样都会同步采集多个指标,典型比如 Prometheus 数据的抓取。
-我们会强烈推荐将同时采样的指标数据放在一张表里,这样能显著地提升读写吞吐以及数据的压缩效率。
+## 宽表 vs. 多表
+
+在监控或 IoT 场景中,通常同时收集多个指标。
+我们建议将同时收集的指标放在单个表中,以提高读写吞吐量和数据压缩效率。

-比较遗憾,Prometheus 的存储还是多表单值的方式,不过 GreptimeDB 的 Prometheus Remote Storage 协议支持,通过 [Metric 引擎](/contributor-guide/datanode/metric-engine.md)在底层实现了宽表的数据共享。
+比较遗憾,Prometheus 的存储还是多表单值的方式,不过 GreptimeDB 的 Prometheus Remote Storage protocol 通过[Metric Engine](/contributor-guide/datanode/metric-engine.md)在底层使用宽表数据共享。
## 分布式表
-GreptimeDB 支持对数据表进行分区操作以分散读写热点,来达到水平扩容的目的。
+GreptimeDB 支持对数据表进行分区,以分散读写热点并实现水平扩展。
+
+### 关于分布式表的两个误解
-### 分布式表的两个常见误区
+作为时序数据库,GreptimeDB 在存储层自动基于 TIME INDEX 列对数据进行分区。
+因此,你无需也不建议按时间分区数据
+(例如,每天一个分区或每周一个表)。
-首先作为时序数据库,GreptimeDB 在存储层已经自动基于 TIME INDEX 列组织数据,保证数据在物理上的连续性和有序性。因此无需也不推荐你再按时间进行分区(如一天一个分区或每周一张新表)。
+此外,GreptimeDB 是列式存储数据库,
+因此对表进行分区是指按行进行水平分区,
+每个分区包含所有列。
-此外,GreptimeDB 是列式存储的数据库,所以对表进行分区的时候是指水平按行来分区,每一个分区都包含所有的列。
-### 何时需要分区,以及需要分多少
+### 何时分区以及确定分区数量
-在每个主要版本更新时,GreptimeDB 都会随源码发布最新的[基准测试报告](https://github.com/GreptimeTeam/greptimedb/tree/main/docs/benchmarks/tsbs) ,这份报告便代表着单个分区的写入效率。
-你可以根据这份报告以及目标场景来估算写入量是否到达了单分区的瓶颈。
-假设分区效果理想,通常可直接按照写入量来估算总分区数量,并在估计时按情况预留 30%~50% 的冗余资源来保证查询性能和稳定性。
-该比例可按情况自由调整,
-例如,某场景估算单表平均写入量 300 万行每秒,经过测试发现单分区写入上限为 50 万行每秒。考虑峰值写入量可能达到 500 万行每秒,以及查询负载稳定且较低。因此该场景下可预留 10~12 个分区。
+单个表能够利用机器中的所有资源,特别是在查询的时候。
+分区表并不总是能提高性能:
-### 分区方式
+- 分布式查询计划并不总是像本地查询计划那样高效。
+- 分布式查询可能会引入网络间额外的数据传输。
-GreptimeDB 采用表达式来表示分区规则。具体可参见[用户手册](/user-guide/administration/manage-data/table-sharding.md#partition)。
+因此,除非单台机器不足以服务表,否则无需分区表。
+例如:
-通常来说,为了达到最好的效果,我们推荐分区键尽量均匀分散且稳定,这通常需要一些关于数据分布方式的先验知识。如:
+- 本地磁盘空间不足以存储数据或在使用对象存储时缓存数据。
+- 你需要更多 CPU 来提高查询性能或更多内存用于高开销的查询。
+- 磁盘吞吐量成为瓶颈。
+- 写入速率大于单个节点的吞吐量。
-- 通过 MAC 地址的前/后缀来分区
-- 通过机房编号来分区
-- 通过业务名称
+GreptimeDB 在每次主要版本更新时都会发布[benchmark report](https://github.com/GreptimeTeam/greptimedb/tree/VAR::greptimedbVersion/docs/benchmarks/tsbs),
+里面提供了单个分区的写入吞吐量作为参考。
+你可以在你的的目标场景根据该报告来估计写入量是否接近单个分区的限制。
-同时,分区键也应该尽量贴合查询条件。例如大部分查询只关注某一个机房或业务内的数据,此时机房和业务名称可以作为合适的分区键。如果不清楚具体的数据分布情况,可以通过在已有的数据上进行聚合查询来获取相关的信息。
+估计分区总数时,可以考虑写入吞吐量并额外预留 50% 的 CPU 资源,以确保查询性能和稳定性。也可以根据需要调整此比例。例如,如果查询较多,那么可以预留更多 CPU 资源。
+
+
+### 分区方法
+
+GreptimeDB 使用表达式定义分区规则。
+为获得最佳性能,建议选择均匀分布、稳定且与查询条件一致的分区键。
+
+例如:
+- 按 trace id 的前缀分区。
+- 按数据中心名称分区。
+- 按业务名称分区。
+分区键应该尽量贴合查询条件。
+例如,如果大多数查询针对特定数据中心的数据,那么可以使用数据中心名称作为分区键。
+如果不了解数据分布,可以对现有数据执行聚合查询以收集相关信息。
+更多详情,请参考[表分区指南](/user-guide/administration/manage-data/table-sharding.md#partition)
diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/concepts/data-model.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/concepts/data-model.md
index 44ab6588f..bd9502baa 100644
--- a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/concepts/data-model.md
+++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/concepts/data-model.md
@@ -9,16 +9,18 @@ description: 介绍 GreptimeDB 的数据模型,包括表的结构、列类型
GreptimeDB 使用时序表来进行数据的组织、压缩和过期管理。数据模型主要基于关系型数据库中的表模型,同时考虑到了指标(metrics)、日志(logs)及事件(events)数据的特点。
-GreptimeDB 中的所有数据都被组织成表,每个表中的数据项由三种类型的列组成:`Tag`、`Timestamp` 和 `Field`。
+GreptimeDB 中的所有数据都被组织成具有名称的表,每个表中的数据项由三种语义类型的列组成:`Tag`、`Timestamp` 和 `Field`。
- 表名通常与指标、日志的名称相同。
-- `Tag` 列中存储经常被查询的元数据,其中的值是数据源的标签,通常用于描述数据的特定特征。`Tag` 列具有索引,所以使用 `Tag` 列的查询具备良好的性能。
-- `Timestamp` 是指标、日志及事件的时序数据库的基础,它表示数据生成的日期和时间。Timestamp 具有索引,所以使用 `Timestamp` 的查询具有良好的性能。一个表只能有一个 `Timestamp` 列,被称为时间索引列。
-- 其他列是 `Field` 列,其中的值是被收集的数据指标或日志。这些指标通常是数值或字符串,但也可能是其他类型的数据,例如地理位置。`Field` 列默认情况下没有被索引,对该字段做过滤查询会全表扫描。这可能会消耗大量资源并且性能较差,但是字符串字段可以启用[全文索引](/user-guide/logs/query-logs.md#全文索引加速搜索),以加快日志搜索等查询的速度。
+- `Tag` 列唯一标识时间序列。具有相同 `Tag` 值的行属于同一个时间序列。有些 TSDB 也可能称它们为 label。
+- `Timestamp` 是指标、日志和事件数据库的基础。它表示数据生成的日期和时间。一个表只能有一个具有 `Timestamp` 语义类型的列,也称为时间索引(`Time Index`)列。
+- 其他列是 `Field` 列。字段包含收集的数据指标或日志内容。这些字段通常是数值或字符串,但也可能是其他类型的数据,例如地理位置或时间戳。
-### Metric 表
+表按时间序列对行进行组织,并按 `Timestamp` 对同一时间序列的行进行排序。表还可以根据应用的需求对具有相同 `Tag` 和 `Timestamp` 值的行进行去重。GreptimeDB 按时间序列存储和处理数据。选择正确的表结构对于高效的数据存储和查询至关重要;请参阅[表设计指南](/user-guide/administration/design-table.md)了解更多详情。
-假设我们有一个名为 `system_metrics` 的时间序列表用于监控独立设备的资源使用情况。
+### 指标
+
+假设我们有一个名为 `system_metrics` 的表,用于监控数据中心中机器的资源使用情况:
```sql
CREATE TABLE IF NOT EXISTS system_metrics (
@@ -37,18 +39,17 @@ CREATE TABLE IF NOT EXISTS system_metrics (

-这与大家熟悉的表模型非常相似。不同之处在于 `Timestamp` 约束,它用于将 `ts` 列指定为此表的时间索引列。
+这与大家熟悉的表模型非常相似。不同之处在于 `TIME INDEX` 约束,它用于将 `ts` 列指定为此表的时间索引列。
- 表名为 `system_metrics`。
-- 对于 `Tag` 列,`host` 列表示收集的独立机器的主机名,`idc` 列显示机器所在的数据中心。这些是查询元数据,可以在查询时有效地过滤数据。
-- `Timestamp` 列 `ts` 表示收集数据的时间。使用该列查询具有时间范围的数据时具备较高的性能。
-- `Field` 列中的 `cpu_util`、`memory_util`、`disk_util` 和 `load` 列分别表示机器的 CPU 利用率、内存利用率、磁盘利用率和负载。
- 这些列包含实际的数据并且不被索引,但是可以被高效地计算,例如求最大最小值、均值和百分比分布等。
- 请避免在查询条件中使用 `Field` 列,这会消耗大量资源并且性能较差。
+- `PRIMARY KEY` 约束指定了表的 `Tag` 列。`host` 列表示收集的独立机器的主机名,`idc` 列显示机器所在的数据中心。
+- `Timestamp` 列 `ts` 表示收集数据的时间。
+- `Field` 列中的 `cpu_util`、`memory_util`、`disk_util` 列分别表示机器的 CPU 利用率、内存利用率和磁盘利用率。这些列包含实际的数据。
+- 表按 `host`、`idc`、`ts` 对行进行排序和去重。因此,查询 `select count(*) from system_metrics` 需要扫描所有的行做统计。
-### Log 表
+### 事件
-你还可以创建一个日志表用于存储访问日志:
+另一个例子是创建一个用于事件(如访问日志)的表:
```sql
CREATE TABLE access_logs (
@@ -58,27 +59,25 @@ CREATE TABLE access_logs (
http_method STRING,
http_refer STRING,
user_agent STRING,
- request STRING FULLTEXT INDEX,
- PRIMARY KEY (http_status, http_method)
+ request STRING,
) with ('append_mode'='true');
```
-其中:
- 时间索引列为 `access_time`。
-- `http_status`、`http_method` 为 Tag。
-- `remote_addr`、`http_refer`、`user_agent`、`request` 为 Field。`request` 是通过 [`FULLTEXT INDEX` 列选项](/reference/sql/create.md#fulltext-列选项)启用全文索引的字段。
-- 这个表是一个用于存储日志的 [append-only 表](/reference/sql/create.md#创建-append-only-表)。它允许一个主键下存在重复的时间戳。
+- 没有 tag 列。
+- `http_status`、`http_method`、`remote_addr`、`http_refer`、`user_agent` 和 `request` 是字段列。
+- 表按 `access_time` 对行进行排序。
+- 这个表是一个用于存储不需要去重的日志的[append-only 表](/reference/sql/create.md#创建-append-only-表)。
+- 查询 append-only 表一般会更快。例如,`select count(*) from access_logs` 可以直接使用统计信息作为结果而不需要考虑重复。
要了解如何指定 `Tag`、`Timestamp` 和 `Field` 列,请参见[表管理](/user-guide/administration/manage-data/basic-table-operations.md#创建表)和 [CREATE 语句](/reference/sql/create.md)。
-当然,无论何时,你都可以将指标和日志放在一张表里,这也是 GreptimeDB 提供的关键能力。
-
## 设计考虑
GreptimeDB 基于表进行设计,原因如下:
- 表格模型易于学习,具有广泛的用户群体,我们只需引入时间索引的概念即可实现对指标、日志和事件的统一。
-- Schema 是描述数据特征的元数据,对于用户来说更方便管理和维护。通过引入 Schema 版本的概念,我们可以更好地管理数据兼容性。
+- Schema 是描述数据特征的元数据,对于用户来说更方便管理和维护。
- Schema 通过其类型、长度等信息带来了巨大的优化存储和计算的好处,我们可以进行有针对性的优化。
- 当我们有了表格 Schema 后,自然而然地引入了 SQL,并用它来处理各种表之间的关联分析和聚合查询,为用户抵消了学习和使用成本。
- 比起 OpenTSDB 和 Prometheus 采用的单值模型,GreptimeDB 使用多值模型使其中一行数据可以具有多列数据。多值模型面向数据源建模,一个 metric 可以有用 field 表示的值。多值模型的优势在于它可以一次性向数据库写入或读取多个值,从而减少传输流量并简化查询。相比之下,单值模型则需要将数据拆分成多个记录。阅读[博客](https://greptime.com/blogs/2024-05-09-prometheus)以获取更多详情。