Move ZMQ monitoring radio initialization into HTEX #3813

benclifford · 2025-03-14T14:49:18Z

This is part of an ongoing project to move monitoring message forwarding into relevant executors, with executors only initializing whichever routers are needed.

In this step of the project:

The ZMQ radio/router is only used by the high throughput executor to receive messages from the interchange. This PR makes the initialization of that look more like other interchange<->submit side ZMQ initialiation, rather than look like monitoring system initialization.

This does not stop other future components which want to use monitoring-over-ZMQ: those components would initialize ZMQ routing themselves.

API-wise:

Parsl executors lose the hub_zmq_port attribute which was configured by the DFK after initialising monitoring.

As a replacement, Parsl executors gain a monitoring_messages attribute, which is the multiprocessing.Queue where executors should send monitoring messages when they have received them from their own components. This queue already existed between monitoring components, but is now exposed as an interface for other Parsl components.

Configuration-wise:

Several ZMQ-related monitoring configuration options now come from HTEX configuration, rather than from MonitoringHub configuration. There are no new configuration options.

Choice of ZMQ port now comes from the same range as other HTEX ZMQ ports, rather than from the port range in MonitoringHub. This means the minimum number of available ports there is now 4, instead of 3.
ZMQ router log files now go into the High Throughput Executor's log directory, usually a subdirectory of the main log directory.
ZMQ log verbosity is configured by High Throughput Executor's worker_debug configuration parameter, rather than by MonitoringHub.monitoring_debug.
The ZMQ router will now listen on the interchange loopback address, configured by HighThroughputExecutor.loopback_address, rather than on all addresses.

Type of change

New feature
Code maintenance/cleanup

This is part of an ongoing project to move monitoring message forwarding into relevant executors, with executors only initializing whichever routers are needed. In this step of the project: The ZMQ radio/router is only used by the high throughput executor to receive messages from the interchange. This PR makes the initialization of that look more like other interchange<->submit side ZMQ initialiation, rather than look like monitoring system initialization. This does not stop other future components which want to use monitoring-over-ZMQ: those components would initialize ZMQ routing themselves. API-wise: Parsl executors lose the hub_zmq_port attribute which was configured by the DFK after initialising monitoring. As a replacement, Parsl executors gain a monitoring_messages attribute, which is the multiprocessing.Queue where executors should send monitoring messages when they have received them from their own components. This queue already existed between monitoring components, but is now exposed as an interface for other Parsl components. Configuration-wise: Several ZMQ-related monitoring configuration options now come from HTEX configuration, rather than from MonitoringHub configuration. There are no new configuration options. * Choice of ZMQ port now comes from the same range as other HTEX ZMQ ports, rather than from the port range in MonitoringHub. This means the minimum number of available ports there is now 4, instead of 3. * ZMQ router log files now go into the High Throughput Executor's log directory, usually a subdirectory of the main log directory. * ZMQ log verbosity is configured by High Throughput Executor's worker_debug configuration parameter, rather than by MonitoringHub.monitoring_debug. * The ZMQ router will now listen on the interchange loopback address, configured by HighThroughputExecutor.loopback_address, rather than on all addresses.

khk-globus

Looks generally good; a couple of inline thoughts, questions, and/or suggestions.

khk-globus · 2025-03-19T19:11:04Z

parsl/executors/base.py

+    The DataFlowKernel will set these two attributes before calling .start(),
+    if monitoring is enabled:


Stylistically, I might switch the clauses. (But by no means a blocker.)

If monitoring is enabled, the DataFlowKernel will set the following two ...

khk-globus · 2025-03-19T19:12:39Z

parsl/executors/base.py

+        monitoring_messages: Optional[Queue[TaggedMonitoringMessage]] - an executor
+            can send messages to the monitoring hub by putting them into
+            this queue.
+
+        submit_monitoring_radio: Optional[MonitoringRadioSender] - an executor can
+            send messages to the monitoring hub by sending them using this sender.
    """


I get that it's what's in place now, but if typing is included in the function signature, I've gotten to not including it in the list of documented parameters, so that there's a single point of authority and/or a single place to update later. A thought.

khk-globus · 2025-03-20T03:05:26Z

parsl/tests/test_monitoring/test_radio_zmq.py

+                           loopback_address=loopback,
+                           port_range=(49152, 65535),


For clarity, for those who may not be inclined toward networks, should we consider making this specific tuple/range a constant within Parsl? I'm thinking both "perhaps it doesn't matter, this is only likely to be used by developers" and "this is how students and younger developers learn of these seemingly random details."

I'm not sure where it would live if we did include it. Perhaps arbitrarily in the top-level Parsl namespace? Or addresses.py?

# parsl/__init__.py or parsl/addresses.py? ... DYNAMIC_PORT_RANGE = (0xc000, 0xffff) ...

theres an RFC for ephemeral ports https://datatracker.ietf.org/doc/html/rfc6056#section-2.1 and I separately had been thinking about making a parsl.utils constant for that and moving everything over. seeing as you have also gone along roughly the same path, you or me can make that tidyup.

khk-globus · 2025-03-20T03:10:17Z

parsl/monitoring/radios/zmq_router.py

+        comm_q.close()
+        comm_q.join_thread()


Ought these be in the finally block? Or otherwise no skipped in the exceptional case?

they were not in the finally block in the code this is moved from. I'd prefer to not get into peturbing that shutdown dance here. It is probably the right thing to do though.

parsl/monitoring/radios/zmq_router.py

benclifford force-pushed the benc-monitoring-router-in-interchange branch 7 times, most recently from e96d0f1 to 10ad8f2 Compare March 19, 2025 13:43

benclifford closed this Mar 19, 2025

benclifford force-pushed the benc-monitoring-router-in-interchange branch from 10ad8f2 to 3e10e38 Compare March 19, 2025 18:41

benclifford reopened this Mar 19, 2025

benclifford marked this pull request as ready for review March 19, 2025 19:21

benclifford requested a review from khk-globus March 19, 2025 19:21

khk-globus approved these changes Mar 20, 2025

View reviewed changes

benclifford added this pull request to the merge queue Mar 20, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 20, 2025

benclifford added this pull request to the merge queue Mar 20, 2025

Merged via the queue into master with commit d79e7b8 Mar 20, 2025
11 checks passed

benclifford deleted the benc-monitoring-router-in-interchange branch March 20, 2025 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move ZMQ monitoring radio initialization into HTEX #3813

Move ZMQ monitoring radio initialization into HTEX #3813

benclifford commented Mar 14, 2025

khk-globus left a comment

khk-globus Mar 19, 2025

khk-globus Mar 19, 2025

khk-globus Mar 20, 2025

benclifford Mar 20, 2025

khk-globus Mar 20, 2025

benclifford Mar 20, 2025

		The DataFlowKernel will set these two attributes before calling .start(),
		if monitoring is enabled:

Move ZMQ monitoring radio initialization into HTEX #3813

Move ZMQ monitoring radio initialization into HTEX #3813

Conversation

benclifford commented Mar 14, 2025

Type of change

khk-globus left a comment

Choose a reason for hiding this comment

khk-globus Mar 19, 2025

Choose a reason for hiding this comment

khk-globus Mar 19, 2025

Choose a reason for hiding this comment

khk-globus Mar 20, 2025

Choose a reason for hiding this comment

benclifford Mar 20, 2025

Choose a reason for hiding this comment

khk-globus Mar 20, 2025

Choose a reason for hiding this comment

benclifford Mar 20, 2025

Choose a reason for hiding this comment