Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VLAB fails to run iperf server or client #310

Open
pau-hedgehog opened this issue Jan 13, 2025 · 5 comments · May be fixed by #342
Open

VLAB fails to run iperf server or client #310

pau-hedgehog opened this issue Jan 13, 2025 · 5 comments · May be fixed by #342
Assignees

Comments

@pau-hedgehog
Copy link
Contributor

Observed SSH connection to server failure in:
https://github.com/githedgehog/fabricator/actions/runs/12743481149/job/35513736851

09:27:19 DBG Ping result from=server-06 to=server-01 expected=false ok=false fail=true err="Process exited with status 1" out="PING 10.0.1.2 (10.0.1.2) 56(84) bytes of data.\n\n--- 10.0.1.2 ping statistics ---\n5 packets transmitted, 0 received, 100% packet loss, time 4121ms"
09:27:21 DBG IPerf3 result from=server-03 to=server-04 sendSpeed="4.16 Mbps" receiveSpeed="3.94 Mbps" sent="2.60 MB" received="2.47 MB"
09:27:21 DBG Running iperf from=server-06 to=server-05
09:27:27 DBG IPerf3 result from=server-06 to=server-05 sendSpeed="27.16 Mbps" receiveSpeed="26.66 Mbps" sent="16.97 MB" received="16.79 MB"
09:27:27 DBG Running iperf from=server-10 to=server-09
09:27:34 DBG IPerf3 result from=server-10 to=server-09 sendSpeed="8.05 Mbps" receiveSpeed="7.73 Mbps" sent="5.03 MB" received="4.86 MB"
09:27:34 DBG Running iperf from=server-08 to=server-07
09:27:40 DBG IPerf3 result from=server-08 to=server-07 sendSpeed="2.71 Mbps" receiveSpeed="2.45 Mbps" sent="1.70 MB" received="1.54 MB"
09:27:40 DBG Running iperf from=server-04 to=server-03
09:27:46 DBG IPerf3 result from=server-04 to=server-03 sendSpeed="4.96 Mbps" receiveSpeed="4.69 Mbps" sent="3.10 MB" received="2.94 MB"
09:27:46 DBG Running iperf from=server-05 to=server-06
09:27:53 DBG IPerf3 result from=server-05 to=server-06 sendSpeed="11.22 Mbps" receiveSpeed="10.91 Mbps" sent="7.01 MB" received="6.87 MB"
09:27:53 DBG Running iperf from=server-07 to=server-08
09:27:59 DBG IPerf3 result from=server-07 to=server-08 sendSpeed="2.72 Mbps" receiveSpeed="2.52 Mbps" sent="1.70 MB" received="1.58 MB"
09:27:59 DBG Running iperf from=server-09 to=server-10
09:28:06 DBG IPerf3 result from=server-09 to=server-10 sendSpeed="3.87 Mbps" receiveSpeed="3.56 Mbps" sent="2.42 MB" received="2.24 MB"
09:28:06 ERR Error(s) during testing connectivity
09:28:06 ERR Error key=vpcpeer--server-01--server-02 err="checking iperf from server-01 to server-02: running iperf: running iperf server: Process exited with status 1: Directory tree /var/lib/toolbox/core-ghcr.io_githedgehog_toolbox-latest is currently busy.\n"
09:28:06 ERR Error key=vpcpeer--server-02--server-01 err="checking iperf from server-02 to server-01: running iperf: running iperf server: ssh: rejected: connect failed (open failed): "
09:28:06 WRN Error running on-ready commands err="testing connectivity: testing connectivity: testing connectivity from \"server-02\" to \"server-01\": checking iperf from server-02 to server-01: running iperf: running iperf server: ssh: rejected: connect failed (open failed): "
09:28:21 DBG Force exit with code 2 err="context canceled"
@pau-hedgehog
Copy link
Contributor Author

@pau-hedgehog
Copy link
Contributor Author

@pau-hedgehog
Copy link
Contributor Author

pau-hedgehog commented Jan 24, 2025

https://github.com/githedgehog/fabricator/actions/runs/12938876629/job/36090429097

show-tech-vlab-spine-leaf-true-usb.zip

00:07:12 ERR Error key=vpcpeer--server-07--server-08 err="checking iperf from server-07 to server-08: running iperf: running iperf server: ssh: rejected: connect failed (open failed): "

This is the part of the code triggering the error:

func checkIPerf(ctx context.Context, opts TestConnectivityOpts, iperfs *semaphore.Weighted, from, to string, fromSSH, toSSH *goph.Client, toIP netip.Addr, expected bool) error {
    if opts.IPerfsSeconds <= 0 || !expected {
        return nil
    }
    
    if err := iperfs.Acquire(ctx, 1); err != nil {
        return fmt.Errorf("acquiring iperf semaphore: %w", err)
    }
    defer iperfs.Release(1)
    
    ctx, cancel := context.WithTimeout(ctx, time.Duration(opts.IPerfsSeconds+30)*time.Second)
    defer cancel()

    slog.Debug("Running iperf", "from", from, "to", to)
    
    g, ctx := errgroup.WithContext(ctx)
    
    g.Go(func() error {
        out, err := toSSH.RunContext(ctx, fmt.Sprintf("toolbox -q timeout -v %d iperf3 -s -1", opts.IPerfsSeconds+25))
        if err != nil {
            return fmt.Errorf("running iperf server: %w: %s", err, string(out))
        }
    
        return nil
    })

pau-hedgehog added a commit that referenced this issue Jan 24, 2025
Extend show tech logs gathered on server VMs

Fixes #310

Signed-off-by: Pau Capdevila <[email protected]>
@pau-hedgehog pau-hedgehog linked a pull request Jan 24, 2025 that will close this issue
@pau-hedgehog pau-hedgehog changed the title VLAB random SSH connectivity failures VLAB fails to run iperf server or client Jan 27, 2025
@pau-hedgehog
Copy link
Contributor Author

pau-hedgehog commented Jan 27, 2025

I'm consolidating in this issue different kind of errors around the logic to spawn an iperf client/server pair as they could be related. For instance there are cases where it is unable to run the client side with:

11:40:50 ERR Error key=vpcpeer--server-10--server-09 err="checking iperf from server-10 to server-09: running iperf: running iperf client: Process exited with status 1: {\r\n\t\"start\":\t{\r\n\t\t\"connected\":\t[],\r\n\t\t\"version\":\t\"iperf 3.9\",\r\n\t\t\"system_info\":\t\"Linux server-10 6.6.60-flatcar #1 SMP PREEMPT_DYNAMIC Tue Nov 12 16:20:46 -00 2024 x86_64\"\r\n\t},\r\n\t\"intervals\":\t[],\r\n\t\"end\":\t{\r\n\t}\r\n}\r\niperf3: error - unable to send control message: Bad file descriptor\r\n"

Seen in: https://github.com/githedgehog/fabricator/actions/runs/12948500831/job/36117474806

show-tech-vlab-spine-leaf-true-iso.zip

@pau-hedgehog
Copy link
Contributor Author

Another hit on the client side:

https://github.com/githedgehog/fabricator/actions/runs/12948500831/job/36117474806

11:41:23 WRN Error running on-ready commands err="testing connectivity: testing connectivity: testing connectivity from \"server-10\" to \"server-09\": checking iperf from server-10 to server-09: running iperf: running iperf client: Process exited with status 1: {\r\n\t\"start\":\t{\r\n\t\t\"connected\":\t[],\r\n\t\t\"version\":\t\"iperf 3.9\",\r\n\t\t\"system_info\":\t\"Linux server-10 6.6.60-flatcar #1 SMP PREEMPT_DYNAMIC Tue Nov 12 16:20:46 -00 2024 x86_64\"\r\n\t},\r\n\t\"intervals\":\t[],\r\n\t\"end\":\t{\r\n\t}\r\n}\r\niperf3: error - unable to send control message: Bad file descriptor\r\n"

Checking this error message found that this message appears when the server is not yet ready:
esnet/iperf#1233 (comment)

pau-hedgehog added a commit that referenced this issue Jan 29, 2025
Extend show tech logs gathered on server VMs

Fixes #310

Signed-off-by: Pau Capdevila <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant