Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Network performance benchmarks #586

Open
phansel opened this issue Jan 4, 2025 · 4 comments
Open

[Feature Request] Network performance benchmarks #586

phansel opened this issue Jan 4, 2025 · 4 comments

Comments

@phansel
Copy link

phansel commented Jan 4, 2025

How do users notice network degradation between exo nodes in regular operation - whether that's WiFi, Thunderbolt, Ethernet, or another link? How do they characterize links, other than with simpler tools like iperf3?

A typical tool used for this is NetPIPE: https://netpipe.cs.ksu.edu/ . It uses OpenMP and SSH to answer the question: with a message size of N bytes, what is the minimum latency and maximum bandwidth between any two given nodes.

Installation

NetPIPE won't compile as-is on macOS (mpicc Apple clang-1600.0.26.4); the makefile must be modified to remove the RT library.

 makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/makefile b/makefile
index 0cbcbd5..d092575 100755
--- a/makefile
+++ b/makefile
@@ -34,7 +34,7 @@ CC         = gcc
 ifdef MACOSX_DEPLOYMENT_TARGET
    CFLAGS     = -g -O3 -Wall
 else
-   CFLAGS     = -g -O3 -Wall -lrt
+   CFLAGS     = -g -O3 -Wall
 endif
 SRC        = ./src

On Fedora 41 (mpicc gcc version 14.2.1 20240912), a different patch is required to compile:

diff --git a/src/netpipe.h b/src/netpipe.h
index ed96aa6..5bff8d0 100644
--- a/src/netpipe.h
+++ b/src/netpipe.h
@@ -99,7 +99,7 @@
 
    // return error check variable and macro
 
-int err;
+//int err;
 
 #define ERRCHECK( _ltrue, _format, _args...) do {         \
    if( _ltrue ) {                                   \

np.hosts config

On any system, the self-to-self performance is measured with the following np.hosts file:

0.0.0.0 slots=2

For a two-machine system, you'd use this kind of file:

host_0_ip slots=1
host_1_ip slots=1
make mpi
mpirun -np 2 --hostfile np.hosts NPmpi  -o np.mpi.qdr --start 1 --end 65536

This in turn can be plotted with npplot, which depends on gnuplot: https://gitlab.beocat.ksu.edu/PeterGottesman/netpipe-5.x/-/blob/master/npplot?ref_type=heads .

I created a gist that does everything except installation here: https://gist.github.com/phansel/26677111a61a53c0c3cdbdf94ae1a66e.

A future version of exo could characterize each path in a cluster at runtime and use that to improve resource allocation or report connectivity issues (e.g. degraded cable or connector).

I'm curious what the TB4/TB5 performance looks like between a couple of Mac Mini nodes, or between a Mac Mini and a laptop on AC power vs. on its internal battery. Not much data on 40Gb TB4 or "80Gb" TB5 latency out there.

@AlexCheema props for publishing exo!

@AlexCheema
Copy link
Contributor

This is interesting -- so you're saying we should model the relationship between message size and bandwidth/latency and use that information to change how we split the model?

@phansel

This comment was marked as resolved.

@phansel
Copy link
Author

phansel commented Jan 7, 2025

My hosts file wasn't correct. The task was likely distributed to two processes on the same machine. The bandwidth at 65kbytes is still a bit unrealistically high, but it's more reasonable than the previous plots.

np.hosts needed to be in this format:

host1_ip slots=1
host2_ip slots=1

Single TB4 cable:
np-M2Ultra-oneTB4-bw
np-M2Ultra-oneTB4-latency

Two TB4 cables:
np-M2Ultra-bw
np-M2Ultra-latency

I also see a bit higher iperf3 speeds after re-connecting the cables: 35 Gbit/sec with one cable, 50 Gbit/sec with both. The connections have to be fully removed and re-made to hit these numbers.

@phansel
Copy link
Author

phansel commented Jan 8, 2025

Tried to get a benchmark on the 10G Ethernet built-in to these two devices. Could not get NetPIPE to progress beyond:

user@dev1 $ mpirun --hostfile np.eth.hosts NPmpi --start 1 --end 65536 -o qdr.eth          
Saving output to qdr.eth
      Clock resolution ~   1.000 usecs      Clock accuracy ~   1.000 usecs
Start testing with 7 trials for each message size
[dev2][[3076,1],1][btl_tcp_frag.c:241:mca_btl_tcp_frag_recv] peer: dev1 mca_btl_tcp_frag_recv: readv failed: Operation timed out (60)
[dev2:00000] *** An error occurred in Socket closed
[dev2:00000] *** reported by process [201588737,1]
[dev2:00000] *** on a NULL communicator
[dev2:00000] *** Unknown error
[dev2:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[dev2:00000] ***    and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun detected that one or more processes exited with non-zero status,
thus causing the job to be terminated. The first process to do so was:

   Process name: [prterun-dev1-1436@1,1]
   Exit code:    14
--------------------------------------------------------------------------
  1:       1  B     24999 times -->  %  

iperf3 works fine and shows 9.41 Gbit in either direction.

Either I'm holding it wrong or something's not behaving correctly under macOS Sequoia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants