Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 21 additions & 6 deletions tidb-cloud/built-in-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,27 +82,42 @@
| TiFlash IO MBps | node-write, node-read | The total bytes of read and write in each TiFlash node. |
| TiFlash Storage Usage | node, limit | The storage usage statistics or upper limit of each TiFlash node. |

## Metrics for {{{ .starter }}} and Essential clusters
## Metrics for {{{ .starter }}} and {{{ .essential }}} clusters

The **Metrics** page provides two tabs for metrics of {{{ .starter }}} and {{{ .essential }}} clusters:

- **Cluster Status**: displays the cluster-level main metrics.
- **Overview**: displays the cluster-level core metrics.
- **Cluster Status**: displays the cluster-level main advanced metrics.
- **Database Status**: displays the database-level main metrics.

### Overview
| Metric name | Labels | Description |
| :------------| :------| :-------------------------------------------- |
| Request Units | RU per second | The Request Unit (RU) is a unit of measurement used to track the resource consumption of a query or transaction to the {{{ .starter }}} cluster. In addition to queries that you run, Request Units can be consumed by background activities, so when the QPS is 0, the Request Units per second might not be zero. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Request Units | RU per second | The Request Unit (RU) is a unit of measurement used to track the resource consumption of a query or transaction to the {{{ .starter }}} cluster. In addition to queries that you run, Request Units can be consumed by background activities, so when the QPS is 0, the Request Units per second might not be zero. |
| Request Units | RU per second | The Request Unit (RU) is a unit of measurement used to track the resource consumption of a query or transaction in a {{{ .starter }}} cluster. Besides user queries, background activities can also consume RUs, so when QPS is 0, RU usage per second might still be nonzero.|

| Capacity vs Usage (RU/s) | Provisioned capacity (RCU), Consumed RU/s | The provisioned capacity (RCU) and the consumed Request Units (RU) per second to the {{{ .essential }}} clusters. |
| Used Storage Size | Row-based storage, Columnar storage | The size of the row store and the size of the column store. |
| Query Per Second | All | The number of SQL statements executed per second, which are collected by SQL types, such as `SELECT`, `INSERT`, and `UPDATE`. |
| Query Duration | All | The duration from receiving a request from the client to the {{{ .starter }}} or {{{ .essential }}} cluster until the cluster executes the request and returns the result to the client. |
| Total Connection | All | The number of connections to the {{{ .starter }}} or {{{ .essential }}} cluster. |

### Cluster Status

The following table illustrates the cluster-level main metrics under the **Cluster Status** tab.
The following table illustrates the cluster-level main advanced metrics under the **Cluster Status** tab.

| Metric name | Labels | Description |
| :------------| :------| :-------------------------------------------- |
| Request Units | RU per second | The Request Unit (RU) is a unit of measurement used to track the resource consumption of a query or transaction. In addition to queries that you run, Request Units can be consumed by background activities, so when the QPS is 0, the Request Units per second might not be zero. |
| Request Units | RU per second | The Request Unit (RU) is a unit of measurement used to track the resource consumption of a query or transaction to the {{{ .starter }}} cluster. In addition to queries that you run, Request Units can be consumed by background activities, so when the QPS is 0, the Request Units per second might not be zero. |
| Capacity vs Usage (RU/s) | Provisioned capacity (RCU), Consumed RU/s | The provisioned capacity (RCU) and the consumed Request Units (RU) per second to the {{{ .essential }}} clusters. |
| Used Storage Size | Row-based storage, Columnar storage | The size of the row store and the size of the column store. |
| Query Per Second | All, {SQL type} | The number of SQL statements executed per second, which are collected by SQL types, such as `SELECT`, `INSERT`, and `UPDATE`. |
| Average Query Duration | All, {SQL type} | The duration from receiving a request from the client to the {{{ .starter }}} or {{{ .essential }}} cluster until the cluster executes the request and returns the result to the client. |
| Query Duration | avg-{SQL type}, P99-{SQL type} | The average or the 99th percentile duration from receiving a request from the client to the {{{ .starter }}} or {{{ .essential }}} cluster until the cluster executes the request and returns the result to the client. |

Check failure on line 113 in tidb-cloud/built-in-monitoring.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ordinal] Spell out all ordinal numbers ('99th') in text. Raw Output: {"message": "[PingCAP.Ordinal] Spell out all ordinal numbers ('99th') in text.", "location": {"path": "tidb-cloud/built-in-monitoring.md", "range": {"start": {"line": 113, "column": 72}}}, "severity": "ERROR"}
| Failed Query | All | The number of SQL statement execution errors per second. |
| Transaction Per Second | All | The number of transactions executed per second. |
| Average Transaction Duration | All | The average execution duration of transactions. |
| Transaction Duration | avg, P99 | The average and the 99th percentile execution duration of transactions. |

Check failure on line 116 in tidb-cloud/built-in-monitoring.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ordinal] Spell out all ordinal numbers ('99th') in text. Raw Output: {"message": "[PingCAP.Ordinal] Spell out all ordinal numbers ('99th') in text.", "location": {"path": "tidb-cloud/built-in-monitoring.md", "range": {"start": {"line": 116, "column": 57}}}, "severity": "ERROR"}
| Lock wait | P95, P99 | The 95th and the 99th percentile durations are the times taken by transactions waiting to acquire pessimistic locks. High values indicate contention for the same rows or keys. |

Check failure on line 117 in tidb-cloud/built-in-monitoring.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ordinal] Spell out all ordinal numbers ('99th') in text. Raw Output: {"message": "[PingCAP.Ordinal] Spell out all ordinal numbers ('99th') in text.", "location": {"path": "tidb-cloud/built-in-monitoring.md", "range": {"start": {"line": 117, "column": 43}}}, "severity": "ERROR"}

Check failure on line 117 in tidb-cloud/built-in-monitoring.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ordinal] Spell out all ordinal numbers ('95th') in text. Raw Output: {"message": "[PingCAP.Ordinal] Spell out all ordinal numbers ('95th') in text.", "location": {"path": "tidb-cloud/built-in-monitoring.md", "range": {"start": {"line": 117, "column": 30}}}, "severity": "ERROR"}
| Total Connection | All | The number of connections to the {{{ .starter }}} or {{{ .essential }}} cluster. |
| Idle Connection Duration | P99, P99(in-txn), P99(not-in-txn) | The 99th percentile time connections remained idle while inside an open transaction. Long values usually indicate slow app logic or long-running transactions. |

Check failure on line 119 in tidb-cloud/built-in-monitoring.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ordinal] Spell out all ordinal numbers ('99th') in text. Raw Output: {"message": "[PingCAP.Ordinal] Spell out all ordinal numbers ('99th') in text.", "location": {"path": "tidb-cloud/built-in-monitoring.md", "range": {"start": {"line": 119, "column": 70}}}, "severity": "ERROR"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current description only explains the P99(in-txn) label, but the labels also include P99 and P99(not-in-txn). To make the description complete, it should explain all the labels.1

Suggested change
| Idle Connection Duration | P99, P99(in-txn), P99(not-in-txn) | The 99th percentile time connections remained idle while inside an open transaction. Long values usually indicate slow app logic or long-running transactions. |
| Idle Connection Duration | P99, P99(in-txn), P99(not-in-txn) | The 99th percentile of time that connections remained idle. `P99(in-txn)` shows idle time within an open transaction, while `P99(not-in-txn)` shows idle time outside of a transaction. Long values usually indicate slow app logic or long-running transactions. |

Style Guide References

Footnotes

  1. The documentation should be complete.



### Database Status

Expand Down