Skip to content

Added some useful info from Misha #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: "Notes on Various Errors with respect to replication and distributed connections"
linkTtitle: "Notes on Various Errors with respect to replication and distributed connections"
description: >
Notes on errors related to replication and distributed connections
keywords:
- replication
- distributed connections
---
# Notes on Various Errors with respect to replication and distributed connections

## `ClickHouseDistributedConnectionExceptions`

This alert usually indicates that one of the nodes isn’t responding or that there’s an interconnectivity issue. Debug steps:

## 1. Check Cluster Connectivity
Verify connectivity inside the cluster by running:
```
SELECT count() FROM clusterAllReplicas('{cluster}', cluster('{cluster}', system.one))
```

## 2. Check for Errors
Run the following queries to see if any nodes report errors:

```
SELECT hostName(), * FROM clusterAllReplicas('{cluster}', system.clusters) WHERE errors_count > 0;
SELECT hostName(), * FROM clusterAllReplicas('{cluster}', system.errors) WHERE last_error_time > now() - 3600 ORDER BY value;
```

Depending on the results, ensure that the affected node is up and responding to queries. Also, verify that connectivity (DNS, routes, delays) is functioning correctly.

### `ClickHouseReplicatedPartChecksFailed` & `ClickHouseReplicatedPartFailedFetches`

Unless you’re seeing huge numbers, these alerts can generally be ignored. They’re often a sign of temporary replication issues that ClickHouse resolves on its own. However, if the issue persists or increases rapidly, follow the steps to debug replication issues:

* Check the replication status using tables such as system.replicas and system.replication_queue.
* Examine server logs, system.errors, and system load for any clues.
* Try to restart the replica (`SYSTEM RESTART REPLICA db_name.table_name` command) and, if necessary, contact Altinity support.
10 changes: 10 additions & 0 deletions content/en/altinity-kb-useful-queries/detached-parts.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,13 @@ covered-by-broken - that means that ClickHouse during initialization of replica
```

The list of DETACH_REASONS: https://github.com/ClickHouse/ClickHouse/blob/master/src/Storages/MergeTree/MergeTreePartInfo.h#L163

## More notes on ClickHouseDetachedParts

Detached parts act like the “Recycle Bin” in Windows. When ClickHouse deems some data unneeded—often during internal reconciliations at server startup—it moves the data to the detached area instead of deleting it immediately.

Recovery: If you’re missing data due to misconfiguration or an error (such as connecting to the wrong ZooKeeper), check the detached parts. The missing data might be recoverable through manual intervention.

Cleanup: Otherwise, clean up the detached parts periodically to free disk space.

Regarding detached parts and the absence of an automatic cleanup feature within ClickHouse: this was a deliberate decision, as there is a possibility that data may appear there due to a bug in ClickHouse's code, a hardware error (such as a memory error or disk failure), etc. In such cases, automatic cleanup is not desirable.