Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move ZMQ monitoring radio initialization into HTEX #3813

Merged
merged 1 commit into from
Mar 20, 2025

Conversation

benclifford
Copy link
Collaborator

This is part of an ongoing project to move monitoring message forwarding into relevant executors, with executors only initializing whichever routers are needed.

In this step of the project:

The ZMQ radio/router is only used by the high throughput executor to receive messages from the interchange. This PR makes the initialization of that look more like other interchange<->submit side ZMQ initialiation, rather than look like monitoring system initialization.

This does not stop other future components which want to use monitoring-over-ZMQ: those components would initialize ZMQ routing themselves.

API-wise:

Parsl executors lose the hub_zmq_port attribute which was configured by the DFK after initialising monitoring.

As a replacement, Parsl executors gain a monitoring_messages attribute, which is the multiprocessing.Queue where executors should send monitoring messages when they have received them from their own components. This queue already existed between monitoring components, but is now exposed as an interface for other Parsl components.

Configuration-wise:

Several ZMQ-related monitoring configuration options now come from HTEX configuration, rather than from MonitoringHub configuration. There are no new configuration options.

  • Choice of ZMQ port now comes from the same range as other HTEX ZMQ ports, rather than from the port range in MonitoringHub. This means the minimum number of available ports there is now 4, instead of 3.

  • ZMQ router log files now go into the High Throughput Executor's log directory, usually a subdirectory of the main log directory.

  • ZMQ log verbosity is configured by High Throughput Executor's worker_debug configuration parameter, rather than by MonitoringHub.monitoring_debug.

  • The ZMQ router will now listen on the interchange loopback address, configured by HighThroughputExecutor.loopback_address, rather than on all addresses.

Type of change

  • New feature
  • Code maintenance/cleanup

@benclifford benclifford force-pushed the benc-monitoring-router-in-interchange branch 7 times, most recently from e96d0f1 to 10ad8f2 Compare March 19, 2025 13:43
@benclifford benclifford force-pushed the benc-monitoring-router-in-interchange branch from 10ad8f2 to 3e10e38 Compare March 19, 2025 18:41
This is part of an ongoing project to move monitoring message forwarding into
relevant executors, with executors only initializing whichever routers are
needed.

In this step of the project:

The ZMQ radio/router is only used by the high throughput executor to
receive messages from the interchange. This PR makes the initialization of
that look more like other interchange<->submit side ZMQ initialiation,
rather than look like monitoring system initialization.

This does not stop other future components which want to use
monitoring-over-ZMQ: those components would initialize ZMQ routing themselves.

API-wise:

Parsl executors lose the hub_zmq_port attribute which was configured by the
DFK after initialising monitoring.

As a replacement, Parsl executors gain a monitoring_messages attribute,
which is the multiprocessing.Queue where executors should send monitoring
messages when they have received them from their own components. This queue
already existed between monitoring components, but is now exposed as an
interface for other Parsl components.

Configuration-wise:

Several ZMQ-related monitoring configuration options now come from HTEX
configuration, rather than from MonitoringHub configuration. There are no
new configuration options.

* Choice of ZMQ port now comes from the same range as other HTEX ZMQ ports,
rather than from the port range in MonitoringHub. This means the minimum
number of available ports there is now 4, instead of 3.

* ZMQ router log files now go into the High Throughput Executor's log
directory, usually a subdirectory of the main log directory.

* ZMQ log verbosity is configured by High Throughput Executor's worker_debug
configuration parameter, rather than by MonitoringHub.monitoring_debug.

* The ZMQ router will now listen on the interchange loopback address,
configured by HighThroughputExecutor.loopback_address, rather than on all
addresses.
@benclifford benclifford reopened this Mar 19, 2025
@benclifford benclifford marked this pull request as ready for review March 19, 2025 19:21
@benclifford benclifford requested a review from khk-globus March 19, 2025 19:21
Copy link
Collaborator

@khk-globus khk-globus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good; a couple of inline thoughts, questions, and/or suggestions.

Comment on lines +50 to +51
The DataFlowKernel will set these two attributes before calling .start(),
if monitoring is enabled:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistically, I might switch the clauses. (But by no means a blocker.)

If monitoring is enabled, the DataFlowKernel will set the following two ...

Comment on lines +53 to 59
monitoring_messages: Optional[Queue[TaggedMonitoringMessage]] - an executor
can send messages to the monitoring hub by putting them into
this queue.

submit_monitoring_radio: Optional[MonitoringRadioSender] - an executor can
send messages to the monitoring hub by sending them using this sender.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that it's what's in place now, but if typing is included in the function signature, I've gotten to not including it in the list of documented parameters, so that there's a single point of authority and/or a single place to update later. A thought.

Comment on lines +13 to +14
loopback_address=loopback,
port_range=(49152, 65535),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, for those who may not be inclined toward networks, should we consider making this specific tuple/range a constant within Parsl? I'm thinking both "perhaps it doesn't matter, this is only likely to be used by developers" and "this is how students and younger developers learn of these seemingly random details."

I'm not sure where it would live if we did include it. Perhaps arbitrarily in the top-level Parsl namespace? Or addresses.py?

# parsl/__init__.py or parsl/addresses.py?
...
DYNAMIC_PORT_RANGE = (0xc000, 0xffff)
...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theres an RFC for ephemeral ports https://datatracker.ietf.org/doc/html/rfc6056#section-2.1 and I separately had been thinking about making a parsl.utils constant for that and moving everything over. seeing as you have also gone along roughly the same path, you or me can make that tidyup.

Comment on lines +182 to +183
comm_q.close()
comm_q.join_thread()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ought these be in the finally block? Or otherwise no skipped in the exceptional case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they were not in the finally block in the code this is moved from. I'd prefer to not get into peturbing that shutdown dance here. It is probably the right thing to do though.

@benclifford benclifford added this pull request to the merge queue Mar 20, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 20, 2025
@benclifford benclifford added this pull request to the merge queue Mar 20, 2025
Merged via the queue into master with commit d79e7b8 Mar 20, 2025
11 checks passed
@benclifford benclifford deleted the benc-monitoring-router-in-interchange branch March 20, 2025 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants