Skip to content

Commit 0e6365f

Browse files
committed
Add Python examples: two-process and queue-based flow control
- nixl_api_2proc.py: Basic two-process example showing both initialize_xfer and make_prepped_xfer modes - nixl_sender_receiver.py: Advanced queue-based flow control with head/tail pointers - Utility functions for metadata exchange and memory operations - Comprehensive documentation for both examples
1 parent 4211b39 commit 0e6365f

File tree

7 files changed

+1733
-0
lines changed

7 files changed

+1733
-0
lines changed

examples/python/NIXL_API_2PROC.md

Lines changed: 320 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
# NIXL Two-Process Example
2+
3+
## Overview
4+
5+
A **basic two-process communication pattern** using NIXL demonstrating fundamental data transfer operations. This example shows both `initialize_xfer` and `make_prepped_xfer` transfer modes in a simple target-initiator pattern.
6+
7+
**Key Features:**
8+
- Two transfer modes (initialize vs. prepared)
9+
- Target-initiator pattern
10+
- Synchronous transfer completion
11+
- Reusable utility functions
12+
- Simple TCP-based metadata exchange
13+
14+
---
15+
16+
## Quick Start
17+
18+
### Usage
19+
20+
```bash
21+
# Run the example (assumes NIXL is properly installed)
22+
python3 nixl_api_2proc.py
23+
```
24+
25+
**Expected Output:**
26+
```
27+
[main] Starting TCP metadata server...
28+
[main] Starting target and initiator processes...
29+
[target] Starting target process
30+
[initiator] Starting initiator process
31+
[target] Memory registered
32+
[target] Published metadata and xfer descriptors to TCP server
33+
[target] Waiting for transfers...
34+
[initiator] Memory registered
35+
[initiator] Waiting for target metadata...
36+
[initiator] Loaded remote agent: target
37+
[initiator] Successfully retrieved target descriptors
38+
[initiator] Starting transfer 1 (READ)...
39+
[initiator] Initial transfer state: PROC
40+
[initiator] Transfer 1 done
41+
[target] Transfer 1 done
42+
[initiator] Starting transfer 2 (WRITE)...
43+
[initiator] Transfer 2 done
44+
[target] Transfer 2 done
45+
[target] Target process complete
46+
[initiator] Initiator process complete
47+
[main] ✓ Test Complete - Both processes succeeded!
48+
```
49+
50+
---
51+
52+
## Architecture Summary
53+
54+
### Processes
55+
56+
**Target Process (lines 37-80):**
57+
- Allocates and registers 2 buffers (256 bytes each)
58+
- Publishes metadata and descriptors to TCP server
59+
- Waits for transfers to complete (polling `check_remote_xfer_done`)
60+
- Passive role - data is written to/read from its buffers
61+
62+
**Initiator Process (lines 83-183):**
63+
- Allocates and registers 2 buffers (256 bytes each)
64+
- Retrieves target's metadata and descriptors
65+
- Performs Transfer 1: READ (using `initialize_xfer`)
66+
- Performs Transfer 2: WRITE (using `make_prepped_xfer`)
67+
- Active role - initiates all transfers
68+
69+
### Transfer Modes
70+
71+
**Transfer 1 - Initialize Mode (lines 111-136):**
72+
```python
73+
xfer_handle_1 = nixl_agent2.initialize_xfer(
74+
"READ", agent2_xfer_descs, agent1_xfer_descs, remote_name, b"UUID1"
75+
)
76+
state = nixl_agent2.transfer(xfer_handle_1)
77+
# Poll for completion
78+
while nixl_agent2.check_xfer_state(xfer_handle_1) != "DONE":
79+
time.sleep(0.001)
80+
```
81+
82+
**Transfer 2 - Prepared Mode (lines 138-172):**
83+
```python
84+
local_prep_handle = nixl_agent2.prep_xfer_dlist(
85+
"NIXL_INIT_AGENT", [(addr3, buf_size, 0), (addr4, buf_size, 0)], "DRAM"
86+
)
87+
remote_prep_handle = nixl_agent2.prep_xfer_dlist(
88+
remote_name, agent1_xfer_descs, "DRAM"
89+
)
90+
xfer_handle_2 = nixl_agent2.make_prepped_xfer(
91+
"WRITE", local_prep_handle, [0, 1], remote_prep_handle, [1, 0], b"UUID2"
92+
)
93+
```
94+
95+
---
96+
97+
## Code Structure
98+
99+
### Phase 1: Setup (lines 37-57, 83-101)
100+
101+
**Target:**
102+
```python
103+
# Create agent with UCX backend
104+
agent_config = nixl_agent_config(backends=["UCX"])
105+
nixl_agent1 = nixl_agent("target", agent_config)
106+
107+
# Allocate memory (2 buffers, 256 bytes each)
108+
addr1 = nixl_utils.malloc_passthru(buf_size * 2)
109+
addr2 = addr1 + buf_size
110+
111+
# Create descriptors (4-tuple for registration, 3-tuple for transfer)
112+
agent1_reg_descs = nixl_agent1.get_reg_descs(agent1_strings, "DRAM")
113+
agent1_xfer_descs = nixl_agent1.get_xfer_descs(agent1_addrs, "DRAM")
114+
115+
# Register with NIXL
116+
nixl_agent1.register_memory(agent1_reg_descs)
117+
```
118+
119+
**Initiator:**
120+
```python
121+
# Create agent (uses default config)
122+
nixl_agent2 = nixl_agent("initiator", None)
123+
# Similar allocation and registration...
124+
```
125+
126+
### Phase 2: Metadata Exchange (lines 59-62, 103-109)
127+
128+
```python
129+
# Target: Publish
130+
publish_agent_metadata(nixl_agent1, "target_meta")
131+
publish_descriptors(nixl_agent1, agent1_xfer_descs, "target_descs")
132+
133+
# Initiator: Retrieve
134+
remote_name = retrieve_agent_metadata(nixl_agent2, "target_meta",
135+
timeout=10.0, role_name="initiator")
136+
agent1_xfer_descs = retrieve_descriptors(nixl_agent2, "target_descs")
137+
```
138+
139+
### Phase 3: Transfers (lines 111-172)
140+
141+
**Transfer 1 - Simple approach:**
142+
- Use `initialize_xfer()` for one-time transfers
143+
- Simpler API, creates transfer on-the-fly
144+
- Good for occasional transfers
145+
146+
**Transfer 2 - Optimized approach:**
147+
- Use `prep_xfer_dlist()` + `make_prepped_xfer()`
148+
- Pre-creates reusable transfer handles
149+
- Better for repeated transfers
150+
151+
### Phase 4: Synchronization (lines 64-75)
152+
153+
**Target waits for completion:**
154+
```python
155+
while not nixl_agent1.check_remote_xfer_done("initiator", b"UUID1"):
156+
time.sleep(0.001)
157+
```
158+
159+
**Initiator polls transfer state:**
160+
```python
161+
while nixl_agent2.check_xfer_state(xfer_handle_1) != "DONE":
162+
time.sleep(0.001)
163+
```
164+
165+
---
166+
167+
## Utility Functions
168+
169+
### From `nixl_metadata_utils.py`
170+
171+
- **`publish_agent_metadata(agent, key)`** - Publish agent metadata to TCP server
172+
- **`retrieve_agent_metadata(agent, key, timeout=10.0, role_name)`** - Retrieve remote agent (customizable timeout)
173+
- **`publish_descriptors(agent, xfer_descs, key)`** - Publish serialized descriptors
174+
- **`retrieve_descriptors(agent, key)`** - Retrieve and deserialize descriptors
175+
176+
### From `tcp_server.py`
177+
178+
- Simple key-value store for metadata exchange
179+
- Only used during setup phase
180+
- Not involved in actual data transfers
181+
182+
---
183+
184+
## Key NIXL Concepts
185+
186+
1. **Memory Registration**: `agent.register_memory(reg_descs)` before transfers
187+
2. **Agent Metadata**: Exchange via `get_agent_metadata()` and `add_remote_agent()`
188+
3. **Descriptor Types**:
189+
- **Registration descriptors**: 4-tuple `(addr, size, dev_id, name)` for memory registration
190+
- **Transfer descriptors**: 3-tuple `(addr, size, dev_id)` for actual transfers
191+
4. **Transfer Modes**:
192+
- **Initialize mode**: `initialize_xfer()` - simple, one-shot
193+
- **Prepared mode**: `prep_xfer_dlist()` + `make_prepped_xfer()` - optimized, reusable
194+
5. **Transfer Operations**:
195+
- **READ**: Initiator reads from target's memory
196+
- **WRITE**: Initiator writes to target's memory
197+
6. **Synchronization**:
198+
- **Local**: `check_xfer_state()` - check local transfer status
199+
- **Remote**: `check_remote_xfer_done()` - check if remote agent completed transfer
200+
201+
---
202+
203+
## Comparison: Initialize vs. Prepared Transfer
204+
205+
| Aspect | Initialize Mode | Prepared Mode |
206+
|--------|----------------|---------------|
207+
| **API** | `initialize_xfer()` | `prep_xfer_dlist()` + `make_prepped_xfer()` |
208+
| **Setup** | Simple, one call | More complex, two-step |
209+
| **Reusability** | One-time use | Reusable handles |
210+
| **Performance** | Good for occasional | Better for repeated |
211+
| **Use Case** | Simple transfers | High-frequency transfers |
212+
| **Example** | Transfer 1 (line 113) | Transfer 2 (line 150) |
213+
214+
---
215+
216+
## References
217+
218+
- **Advanced Example**: `nixl_sender_receiver.py` - Queue-based flow control
219+
- **Utility Functions**: `nixl_metadata_utils.py`, `nixl_memory_utils.py`
220+
- **NIXL Examples**: `nixl_api_example.py`
221+
222+
---
223+
224+
## Detailed Architecture Diagrams
225+
226+
### Setup and Metadata Exchange
227+
228+
```
229+
┌─────────────────────────────────────────────────────────────────────────────┐
230+
│ SETUP PHASE │
231+
└─────────────────────────────────────────────────────────────────────────────┘
232+
233+
TCP Metadata Server Target Process Initiator Process
234+
(Port 9998) ┌──────────────┐ ┌──────────────┐
235+
│ │ │ │ │
236+
│ │ 1. Create │ │ 1. Create │
237+
│ │ Agent │ │ Agent │
238+
│ │ (UCX) │ │ (default) │
239+
│ │ │ │ │
240+
│ │ 2. Allocate │ │ 2. Allocate │
241+
│ │ 2 buffers │ │ 2 buffers │
242+
│ │ (256B ea) │ │ (256B ea) │
243+
│ │ │ │ │
244+
│ │ 3. Register │ │ 3. Register │
245+
│ │ Memory │ │ Memory │
246+
│ │ │ │ │
247+
│◄────(publish)───┤ │ │ │
248+
│ "target_meta" │ │ │ │
249+
│ + descriptors │ │ │ │
250+
│ │ │ │ │
251+
│ │ 4. Wait for │ │ │
252+
│ │ transfers │ │ │
253+
│ │ │ │ │
254+
│─(retrieve)──────────────────────────────────►│ 4. Retrieve │
255+
│ "target_meta" + descriptors │ metadata │
256+
│ │ │ │ │
257+
│ │ │ │ 5. Add │
258+
│ │ │ │ remote │
259+
│ │ │ │ agent │
260+
│ └──────────────┘ └──────┬───────┘
261+
│ │
262+
│ Connection Established │
263+
│ │
264+
(TCP server only used for metadata exchange, not for data transfers)
265+
```
266+
267+
### Transfer Flow
268+
269+
```
270+
┌─────────────────────────────────────────────────────────────────────────────┐
271+
│ TRANSFER PHASE │
272+
└─────────────────────────────────────────────────────────────────────────────┘
273+
274+
Target Initiator
275+
(Passive) (Active)
276+
│ │
277+
│ Waiting for transfers... │
278+
│ │
279+
│ │
280+
│ ┌───────────────┴──────────┐
281+
│ │ Transfer 1: READ │
282+
│ │ Mode: initialize_xfer │
283+
│ └───────────────┬──────────┘
284+
│ │
285+
│◄──────────────────────RDMA READ (256B)─────────────────────┤
286+
│ Initiator reads from target's buffer │
287+
│ │
288+
│ check_remote_xfer_done("initiator", "UUID1") │
289+
│ → TRUE │
290+
│ │
291+
│ Transfer 1 done ───────────────────────────────────────► check_xfer_state()
292+
│ → DONE
293+
│ │
294+
│ │
295+
│ ┌───────────────┴──────────┐
296+
│ │ Transfer 2: WRITE │
297+
│ │ Mode: make_prepped_xfer │
298+
│ └───────────────┬──────────┘
299+
│ │
300+
│◄──────────────────────RDMA WRITE (512B)────────────────────┤
301+
│ Initiator writes to target's buffer │
302+
│ (crossing: writes buf2→buf1, buf1→buf2) │
303+
│ │
304+
│ check_remote_xfer_done("initiator", "UUID2") │
305+
│ → TRUE │
306+
│ │
307+
│ Transfer 2 done ───────────────────────────────────────► check_xfer_state()
308+
│ → DONE
309+
│ │
310+
│ Cleanup & exit ◄───────────────────────────────────────► Cleanup & exit
311+
│ │
312+
```
313+
314+
---
315+
316+
## License
317+
318+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
319+
SPDX-License-Identifier: Apache-2.0
320+

0 commit comments

Comments
 (0)