Round-trip And Network Timing
rant is an Emit/Reflect test (Request/Response), a "ping-pong" test to measure
network latency with nanosecond precision using hardware timestamps.
Emit (client) | | Reflect (server)
---------------------------|-----------------------|----------------------
T1_SW | |
| | |
sendto() | |
\_________ >> T1_HW *|========== >> =========|* T2_HW >> recvmsg()
| | |
| | T2_SW
| | |
| | T3_SW
| | |
| | sendto()
recvmsg() << T4_HW *|========== << =========|* T3_HW << ____/
| | |
T4_SW | |
T1_SW: SW timestamp when userspace sends the packetT1_HW: HW timestamp when packet is actually sent into the wire
T2_HW: HW timestamp when packet is actually received off the wireT2_SW: SW timestamp when packet is delivered to userspace
T3_SW: SW timestamp when userspace sends the packetT3_HW: HW timestamp when packet is actually sent into the wire
T4_HW: HW timestamp when packet is actually received off the wireT4_SW: SW timestamp when packet is delivered to userspace
T4_HW - T1_HW: Hardware Round-Trip Time (client measures)T3_HW - T2_HW: Hardware Response Time (server measures)
Software timestamps (--sw-timestamps) enable additional decomposition:
T1_HW - T1_SW: App-to-NIC TX delay (client)T2_SW - T2_HW: NIC-to-App RX delay (server)T3_SW - T2_SW: App processing time (server)T3_HW - T3_SW: App-to-NIC TX delay (server)T4_SW - T4_HW: NIC-to-App RX delay (client)
gcc -o rant rant.c
./rant --interface <iface> [OPTIONS]
Required:
-i, --interface <iface> Network interface to use
Optional:
-a, --address <ip> Server IP address (client mode, omit for server)
-t, --threshold <us> Stop when latency exceeds threshold (microseconds)
-T, --trace-marker Enable kernel trace_marker integration
-S, --snapshot Take ftrace snapshot on threshold breach
-d, --duration <sec> Test duration in seconds
-w, --warmup <pkts> Number of warmup packets to discard from statistics
-s, --sw-timestamps Collect software timestamps (T*_SW)
-H, --histogram Show histogram summary at end of test
-l, --log <file> Write transaction log to file
-o, --overflow <us> Histogram overflow bucket threshold (default: 100us)
-b, --bucket-size <us> Histogram bucket size (default: 1us)
-G, --hugepages Use hugepages for memory allocation
-B, --budget <n> Set SO_BUSY_POLL_BUDGET (NAPI poll budget)
-P, --prefer-busypoll Set SO_PREFER_BUSY_POLL
-v, --verbose Verbose output (config, allocation, progress)
-h, --help Show help message
sudo nsenter --net=/var/run/netns/ns_ens7f0np0 \
taskset -c 51 chrt -f 22 \
./rant -i ens7f0np0 -a 192.168.1.11 --warmup 100000 --duration 60 --threshold 30Server (Reflect):
sudo nsenter --net=/var/run/netns/ns_ens7f1np1 \
taskset --cpu-list 61 chrt -f 22 \
./rant -i ens7f1np1 \
--warmup 100000 \
--duration 600 \
--threshold 100 \
--sw-timestamps \
--histogram \
--log response.txt \
--trace-marker \
--snapshotClient (Emit):
sudo nsenter --net=/var/run/netns/ns_ens7f0np0 \
taskset -c 51 chrt -f 22 \
./rant -i ens7f0np0 -a 192.168.1.11 \
--warmup 100000 \
--duration 600 \
--threshold 100 \
--sw-timestamps \
--histogram \
--log roundtrip.txt \
--trace-marker \
--snapshotNotes:
nsenter --net=is used instead ofip netns execto preserve access to tracefstasksetpins the process to an isolated CPUchrt -f 22sets SCHED_FIFO priority 22 (below IRQ thread priority)--warmup 100000discards the first 100K packets from statistics
Server:
Trace marker enabled for kernel tracing integration
Trace snapshot enabled
Snapshot mode: circular log buffer of 500000 records (~10s, 34.3 MB)
Memory locked in RAM (mlockall successful)
Test is complete. Duration: 600.12 s
--- Latency Statistics ---
MIN: 6875 ns | MAX: 32989 ns | AVG: 9911 ns | Total Samples: 28546206
Histogram Summary:
Samples : 28546206
Minimum : 6.88 us (#15209052)
Maximum : 32.99 us (#27246923)
Average : 9.91 us
Percentiles (us):
50th : 9 (Median)
90th : 10
95th : 10
99th : 11
99.9th : 14
99.99th : 18
Client:
--- Latency Statistics ---
MIN: 8327 ns | MAX: 34439 ns | AVG: 11363 ns | Total Samples: 28546206
Histogram Summary:
Percentiles (us):
50th : 11 (Median)
90th : 11
99th : 12
99.9th : 16
99.99th : 20
When --log <file> is specified, all transactions are written to a file at
test completion.
Server log (columns: SEQ, T2_HW, T2_SW, T3_SW, T3_HW, RESPONSE):
SEQ T2_HW T2_SW T3_SW T3_HW RESPONSE
------------------------------------------------------------------------------------------------------------------------------------------------------
0 1771555281.293211904 1771555281.293247670 1771555281.293248149 1771555281.293264861 52957
Client log (columns: SEQ, T1_SW, T1_HW, T4_HW, T4_SW, RTT):
SEQ T1_SW T1_HW T4_HW T4_SW RTT
------------------------------------------------------------------------------------------------------------------------------------------------------
0 1771555281.293181657 1771555281.293210998 1771555281.293265767 1771555281.293287981 54769
Software timestamp columns (T*_SW) are only present when --sw-timestamps is used.
Without it, only HW timestamps and the latency delta are logged.
When --snapshot and --log are both active, the log uses a circular buffer
of 500,000 records (~10 seconds at 50K packets/sec, ~34 MB). This allows
long-duration tests without excessive memory usage. Only the last ~10 seconds
of transactions are preserved in the log file, which is typically sufficient
for post-mortem analysis around a threshold breach.
Without --snapshot, the full log is allocated based on test duration.
The --trace-marker flag integrates rant with the kernel's ftrace subsystem.
When enabled:
- rant opens
/sys/kernel/tracing/instances/rant/trace_marker - Writes
RANT_TEST_STARTwhen the test begins (after warmup) - On threshold breach: writes the latency spike details to trace_marker,
then stops tracing (
tracing_on = 0)
This allows correlating application-level latency spikes with kernel events (IRQ delivery, scheduling, softirq processing, etc.) captured in the same trace buffer.
Create a dedicated ftrace instance and configure tracing before running rant:
# Create rant trace instance
mkdir -p /sys/kernel/tracing/instances/rant
# Set buffer size (16 MB per CPU is good for ~10s of trace data)
echo 16384 > /sys/kernel/tracing/instances/rant/buffer_size_kb
# Limit tracing to relevant CPUs (e.g., app CPU 61 and IRQ CPU 62)
echo 60000000,00000000 > /sys/kernel/tracing/instances/rant/tracing_cpumask
# Enable useful tracers/events
echo 1 > /sys/kernel/tracing/instances/rant/events/irq/irq_handler_entry/enable
echo 1 > /sys/kernel/tracing/instances/rant/events/irq/irq_handler_exit/enable
echo 1 > /sys/kernel/tracing/instances/rant/events/net/napi_gro_receive_entry/enable
echo 1 > /sys/kernel/tracing/instances/rant/events/sched/sched_switch/enableThe --snapshot flag triggers an ftrace snapshot when a threshold breach
occurs. Combined with --trace-marker, this captures a frozen copy of the
trace buffer at the exact moment of a latency spike.
Allocate the snapshot buffer before running rant:
echo 1 > /sys/kernel/tracing/instances/rant/snapshotAfter a threshold breach, rant stops tracing and takes a snapshot. Read the data from:
# Snapshot buffer (CPU that triggered the snapshot)
cat /sys/kernel/tracing/instances/rant/snapshot
# Live trace buffer (frozen by tracing_on=0, useful for other CPUs)
cat /sys/kernel/tracing/instances/rant/per_cpu/cpu62/traceNote: The snapshot mechanism is per-CPU — it captures the trace buffer of the CPU that writes to the snapshot file. For data from other CPUs, read from the live trace buffer (which is frozen since tracing was stopped).
When --threshold is set and a latency spike exceeds the threshold, rant
prints a detailed breakdown:
Server (Reflect):
Response latency (330151 ns) exceeds threshold (100000 ns).
T2_HW (NIC rx): 1773883524.780127138
T3_HW (NIC tx): 1773883524.780457289
T2_SW (app rx): 1773883524.780430176
T3_SW (app tx): 1773883524.780430768
NIC-to-app (T2_SW - T2_HW): 303038 ns <-- where the spike is
App-to-NIC (T3_HW - T3_SW): 26521 ns
App processing (T3_SW - T2_SW): 592 ns
This decomposition (requires --sw-timestamps) pinpoints whether the spike
is in the NIC/driver RX path, the application, or the TX path.
mlockall()is called at startup to prevent page faults during the test- Log allocation is based on test duration (estimated at 50K packets/sec)
--hugepagesusesmmapwithMAP_HUGETLBfor log and overflow arrays- The histogram uses 1-byte-per-microsecond buckets (100 buckets + 1 overflow by default)
- Overflow samples (>100us) are stored separately for exact percentile calculation
config.sh sets up a network interface for low-latency testing: namespace
isolation, ethtool tuning, IRQ affinity, CPU isolation, and PTP clock sync.
It auto-discovers hardware properties (MAC, PCI address, driver, IRQ, PTP clock) from the interface name.
config.sh <interface> --ip <local_ip> --remote-ip <remote_ip> \
--cpu <app_cpu> --irq-cpu <irq_cpu> [options]
Required:
<interface> Network interface name
--ip <addr> Local IP address to assign (with /24)
--remote-ip <addr> Remote peer IP address
--cpu <n> CPU core for the application
--irq-cpu <n> CPU core for IRQ handling
Optional:
--remote-mac <mac> Remote MAC address for static ARP
--busy-poll <val> Busy poll value (default: 0)
--ptp-source <dev> PTP source device (auto-detected from sysfs)
--ptp-sync-to <dev> PTP device to sync to (for second port)
--no-namespace Skip namespace setup
--no-ptp Skip PTP sync setup
--irq-prio <n> IRQ thread FIFO priority (default: 55)
--ksoftirqd-prio <n> ksoftirqd FIFO priority (default: 11)
# Configure client interface (port 0)
sudo ./config.sh ens7f0np0 \
--ip 192.168.1.10 --remote-ip 192.168.1.11 \
--remote-mac 58:a2:e1:0b:21:df \
--cpu 51 --irq-cpu 52
# Configure server interface (port 1), sync PTP clocks
sudo ./config.sh ens7f1np1 \
--ip 192.168.1.11 --remote-ip 192.168.1.10 \
--remote-mac 58:a2:e1:0b:21:de \
--cpu 61 --irq-cpu 62 \
--ptp-sync-to /dev/ptp7- Namespace: Creates
ns_<ifname>, moves interface into it - Ethtool: Single queue, checksums off, GRO/LRO/TSO off, coalescing minimized (rx-usecs=0, rx-frames=1), small ring buffers (512), pause frames off
- Driver-specific: Detects
mlx5oriceand applies appropriate private flags (CQE mode off, rx_cqe_moder off, tx_port_ts off for mlx5) - NAPI:
napi_defer_hard_irqs=0,gro_flush_timeout=0 - Network: Static ARP, IPv6 disabled, multicast off, ARP off, noqueue qdisc
- CPU: Isolates app and IRQ CPUs via
tuna, sets IRQ thread to FIFO priority - PCIe: Disables PCIe power management via
setpci - PTP: Starts
phc2sysfor clock synchronization
Features not yet implemented:
- Clock drift detection and warnings
- Hardware timestamp support verification