From 4320932039812c267cf99e4b461ae2e2d902ea05 Mon Sep 17 00:00:00 2001 From: Roberto Seldner Date: Wed, 26 Mar 2025 17:27:52 -0700 Subject: [PATCH 1/5] Update verify-zookeeper-sync-status.md Revisiting this PR that fell through the cracks on my end https://github.com/elastic/cloud/pull/108690 Proposing an alternative command that will provide a cleaner output and not rely on the use of additional system packages. We can achieve the same thing using `curl` which I think is just about ubiquitously available on linux hosts. The new version is written as an inline shell script. So technically not a `one-liner`, but still easily copy/pasteable and much more readable. Equivalent one-liner in the current comment: container level ``` docker exec frc-zookeeper-servers-zookeeper sh -c 'for i in $(seq 2191 2199); do output=$(echo mntr | curl -s telnet://localhost:$i | grep -E "server_state|leader|follower|not currently serving|zk_znode_count"); [ -n "$output" ] && echo "ZK mntr Response from port $i:" && echo "$output" && break; done' ``` host level ``` for i in $(seq 2191 2199); do output=$(echo mntr | curl -s telnet://localhost:$i | grep -E "server_state|leader|follower|not currently serving|zk_znode_count"); [ -n "$output" ] && echo "ZK mntr Response from port $i:" && echo "$output" && break; done ``` I've used a grep that will closely match the one-liner output without the noisey `trying port` lines... ``` ZK mntr Response from port 2191: zk_server_state leader zk_znode_count 783 zk_synced_followers 0 zk_synced_non_voting_followers 0 zk_leader_uptime 795608083 zk_avg_leader_unavailable_time 293.0 zk_min_leader_unavailable_time 293 zk_max_leader_unavailable_time 293 zk_cnt_leader_unavailable_time 1 zk_sum_leader_unavailable_time 293 zk_avg_follower_sync_time 0.0 zk_min_follower_sync_time 0 zk_max_follower_sync_time 0 zk_cnt_follower_sync_time 0 zk_sum_follower_sync_time 0 ``` but we could consider this grep for an even cleaner output ``` grep -E "^zk_server_state|^zk_followers|^zk_synced_followers|not currently serving|^zk_znode_count" ``` Leader Output: ``` ZK mntr Response from port 2191: zk_server_state leader zk_znode_count 783 zk_synced_followers 0 ``` --- .../verify-zookeeper-sync-status.md | 89 +++++++------------ 1 file changed, 32 insertions(+), 57 deletions(-) diff --git a/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md b/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md index c0c42e7da..02b17d643 100644 --- a/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md +++ b/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md @@ -16,10 +16,18 @@ It is recommended to check the ZooKeeper sync status before starting any mainten To check that ZooKeeper is in sync with the correct number of followers, run the following steps: -1. Run the one-line command on each Director node: +1. Run the inline shell script command on each Director node: ```sh - docker exec frc-zookeeper-servers-zookeeper sh -c 'for i in $(seq 2191 2199); do echo trying port: $i;echo mntr | nc localhost ${i} 2>/dev/null | grep "not currently serving";echo mntr | nc localhost ${i} 2>/dev/null| grep leader; echo mntr | $(which nc) localhost ${i} 2>/dev/null | grep follower ; done' + docker exec frc-zookeeper-servers-zookeeper sh -c ' + for i in $(seq 2191 2199); do + output=$(echo mntr | curl -s telnet://localhost:$i | grep -E "server_state|leader|follower|not currently serving|zk_znode_count"); + if [ -n "$output" ]; then + echo "ZK mntr Response from port $i:"; + echo "$output"; + break; + fi + done' ``` ::::{note} @@ -32,53 +40,32 @@ To check that ZooKeeper is in sync with the correct number of followers, run the * All followers are listed as synced -The one-line command can return the following types of output: +The inline shell script command can return the following types of output: * If the host is the current ZooKeeper Leader, the command returns the Leader’s info including follower count and follower sync status. ``` - trying port: 2191 + ZK mntr Response from port 2191: zk_server_state leader - zk_followers 2 + zk_znode_count 783 zk_synced_followers 2 - trying port: 2192 - trying port: 2193 - trying port: 2194 - trying port: 2195 - trying port: 2196 - trying port: 2197 - trying port: 2198 - trying port: 2199 + ... ``` * If the host is a follower, the command returns only the follower state, and continues until it finds the Leader: ``` - trying port: 2191 - trying port: 2192 - trying port: 2193 + ZK mntr Response from port 2193: zk_server_state follower - trying port: 2194 - trying port: 2195 - trying port: 2196 - trying port: 2197 - trying port: 2198 - trying port: 2199 + zk_znode_count 777 + ... ``` * If the ZooKeeper container is up and listening, but the current node doesn’t have the quorum, the command returns the message `This ZooKeeper instance is not currently serving requests`: ``` - trying port: 2191 - trying port: 2192 + ZK mntr Response from port 2192: This ZooKeeper instance is not currently serving requests - trying port: 2193 - trying port: 2194 - trying port: 2195 - trying port: 2196 - trying port: 2197 - trying port: 2198 - trying port: 2199 ``` @@ -86,32 +73,20 @@ Make sure the ZooKeeper container is running on all the Director nodes. If anoth If there is no response on any port, it’s possible that no ZooKeeper ports are currently listening (for ex. running on a non-Director role host, or the ZooKeeper Docker container is not running) -``` -trying port: 2191 -trying port: 2192 -trying port: 2193 -trying port: 2194 -trying port: 2195 -trying port: 2196 -trying port: 2197 -trying port: 2198 -trying port: 2199 -``` -If the one line command doesn’t work, use telnet: - -1. Run `docker ps | grep zoo` to reveal the port in use by the ZooKeeper container on the current host. The port won’t change once the container is started. -2. Install and run telnet, `telnet localhost ` then type `mntr` - - * The port is in the range from 2191 to 2199. - * for example `telnet localhost 2191` - -3. Look for the following output lines: - - * `zk_server_state leader` or `zk_server_state follower` to indicate node leadership - * Lines indicating the follower count and sync status when run against a Leader node - - * `zk_followers 2` - * `zk_synced_followers 2` +If the inline shell script command doesn’t work (e.g. your user lacks permissions to access docker), you can run the check directly from the director host. This approach avoids entering the container and doesn't require installing additional tools like `telnet` or `nc`, relying instead on `curl`, which is typically available by default on most Linux systems. +1. Run the equivalent inline shell script directly on the host terminal (outside of the zookeeper container) + ``` + for i in $(seq 2191 2199); do + output=$(echo mntr | curl -s telnet://localhost:$i | grep -E "server_state|leader|follower|not currently serving|zk_znode_count"); + if [ -n "$output" ]; then + echo "ZK mntr Response from port $i:"; + echo "$output"; + break; + fi + done + ``` +2. Look for the following lines in the output (just as noted above) + * `zk_server_state leader` or `zk_server_state follower` — indicates the node’s ZooKeeper role From 886687d52c018518ad242ec93ea7ef103db17fac Mon Sep 17 00:00:00 2001 From: Roberto Seldner Date: Wed, 16 Apr 2025 06:57:02 -0700 Subject: [PATCH 2/5] Update troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md Co-authored-by: Kuni Sen <30574753+kunisen@users.noreply.github.com> --- .../cloud-enterprise/verify-zookeeper-sync-status.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md b/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md index 02b17d643..ef1951ab1 100644 --- a/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md +++ b/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md @@ -74,7 +74,10 @@ Make sure the ZooKeeper container is running on all the Director nodes. If anoth If there is no response on any port, it’s possible that no ZooKeeper ports are currently listening (for ex. running on a non-Director role host, or the ZooKeeper Docker container is not running) -If the inline shell script command doesn’t work (e.g. your user lacks permissions to access docker), you can run the check directly from the director host. This approach avoids entering the container and doesn't require installing additional tools like `telnet` or `nc`, relying instead on `curl`, which is typically available by default on most Linux systems. + +### Alternative: Check at host level + +If the inline shell script command doesn’t work, you can run the check directly from the director host. This can happen for example when your user lacks permissions to access Docker. This approach avoids entering the container and doesn't require installing additional tools like `telnet` or `nc`, relying instead on `curl`, which is typically available by default on most Linux systems. 1. Run the equivalent inline shell script directly on the host terminal (outside of the zookeeper container) ``` From 8362f5e10323f4d26c191ac9f06c9aa6509f8dcb Mon Sep 17 00:00:00 2001 From: Roberto Seldner Date: Wed, 16 Apr 2025 06:57:24 -0700 Subject: [PATCH 3/5] Update troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md Co-authored-by: florent-leborgne --- .../cloud-enterprise/verify-zookeeper-sync-status.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md b/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md index ef1951ab1..35a358c38 100644 --- a/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md +++ b/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md @@ -79,7 +79,7 @@ If there is no response on any port, it’s possible that no ZooKeeper ports are If the inline shell script command doesn’t work, you can run the check directly from the director host. This can happen for example when your user lacks permissions to access Docker. This approach avoids entering the container and doesn't require installing additional tools like `telnet` or `nc`, relying instead on `curl`, which is typically available by default on most Linux systems. -1. Run the equivalent inline shell script directly on the host terminal (outside of the zookeeper container) +1. Run the equivalent inline shell script directly on the host terminal, outside of the zookeeper container ``` for i in $(seq 2191 2199); do output=$(echo mntr | curl -s telnet://localhost:$i | grep -E "server_state|leader|follower|not currently serving|zk_znode_count"); From ae6d7574d213cf9f1412c68c621bcfd8da2afb86 Mon Sep 17 00:00:00 2001 From: Roberto Seldner Date: Wed, 16 Apr 2025 06:57:44 -0700 Subject: [PATCH 4/5] Update troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md Co-authored-by: florent-leborgne --- .../cloud-enterprise/verify-zookeeper-sync-status.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md b/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md index 35a358c38..152db2c9b 100644 --- a/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md +++ b/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md @@ -90,6 +90,6 @@ If the inline shell script command doesn’t work, you can run the check directl fi done ``` -2. Look for the following lines in the output (just as noted above) +2. Look for the following lines in the output * `zk_server_state leader` or `zk_server_state follower` — indicates the node’s ZooKeeper role From eb0a42bfb4e8fb1a3def20e04cb0fd7c88ad13e0 Mon Sep 17 00:00:00 2001 From: Roberto Seldner Date: Wed, 16 Apr 2025 06:59:33 -0700 Subject: [PATCH 5/5] Update troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md Co-authored-by: Kuni Sen <30574753+kunisen@users.noreply.github.com> --- .../cloud-enterprise/verify-zookeeper-sync-status.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md b/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md index 152db2c9b..f94b19f94 100644 --- a/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md +++ b/troubleshoot/deployments/cloud-enterprise/verify-zookeeper-sync-status.md @@ -14,6 +14,9 @@ It is recommended to check the ZooKeeper sync status before starting any mainten * The ECE UI **Settings** page displays all ZooKeeper nodes as connected, but not all the nodes have completed the syncing with the latest ZooKeeper state. * Connected ZooKeeper nodes participate in the quorum, but they don’t appear in the ECE UI **Settings** page. For example, if the host is removed, ECE no longer cares about it and keeps the ZooKeeper container part of the quorum. + +### Check at container level + To check that ZooKeeper is in sync with the correct number of followers, run the following steps: 1. Run the inline shell script command on each Director node: