-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move ZMQ monitoring radio initialization into HTEX #3813
Conversation
e96d0f1
to
10ad8f2
Compare
10ad8f2
to
3e10e38
Compare
This is part of an ongoing project to move monitoring message forwarding into relevant executors, with executors only initializing whichever routers are needed. In this step of the project: The ZMQ radio/router is only used by the high throughput executor to receive messages from the interchange. This PR makes the initialization of that look more like other interchange<->submit side ZMQ initialiation, rather than look like monitoring system initialization. This does not stop other future components which want to use monitoring-over-ZMQ: those components would initialize ZMQ routing themselves. API-wise: Parsl executors lose the hub_zmq_port attribute which was configured by the DFK after initialising monitoring. As a replacement, Parsl executors gain a monitoring_messages attribute, which is the multiprocessing.Queue where executors should send monitoring messages when they have received them from their own components. This queue already existed between monitoring components, but is now exposed as an interface for other Parsl components. Configuration-wise: Several ZMQ-related monitoring configuration options now come from HTEX configuration, rather than from MonitoringHub configuration. There are no new configuration options. * Choice of ZMQ port now comes from the same range as other HTEX ZMQ ports, rather than from the port range in MonitoringHub. This means the minimum number of available ports there is now 4, instead of 3. * ZMQ router log files now go into the High Throughput Executor's log directory, usually a subdirectory of the main log directory. * ZMQ log verbosity is configured by High Throughput Executor's worker_debug configuration parameter, rather than by MonitoringHub.monitoring_debug. * The ZMQ router will now listen on the interchange loopback address, configured by HighThroughputExecutor.loopback_address, rather than on all addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good; a couple of inline thoughts, questions, and/or suggestions.
The DataFlowKernel will set these two attributes before calling .start(), | ||
if monitoring is enabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistically, I might switch the clauses. (But by no means a blocker.)
If monitoring is enabled, the DataFlowKernel will set the following two ...
monitoring_messages: Optional[Queue[TaggedMonitoringMessage]] - an executor | ||
can send messages to the monitoring hub by putting them into | ||
this queue. | ||
|
||
submit_monitoring_radio: Optional[MonitoringRadioSender] - an executor can | ||
send messages to the monitoring hub by sending them using this sender. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that it's what's in place now, but if typing is included in the function signature, I've gotten to not including it in the list of documented parameters, so that there's a single point of authority and/or a single place to update later. A thought.
loopback_address=loopback, | ||
port_range=(49152, 65535), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, for those who may not be inclined toward networks, should we consider making this specific tuple/range a constant within Parsl? I'm thinking both "perhaps it doesn't matter, this is only likely to be used by developers" and "this is how students and younger developers learn of these seemingly random details."
I'm not sure where it would live if we did include it. Perhaps arbitrarily in the top-level Parsl namespace? Or addresses.py
?
# parsl/__init__.py or parsl/addresses.py?
...
DYNAMIC_PORT_RANGE = (0xc000, 0xffff)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
theres an RFC for ephemeral ports https://datatracker.ietf.org/doc/html/rfc6056#section-2.1 and I separately had been thinking about making a parsl.utils constant for that and moving everything over. seeing as you have also gone along roughly the same path, you or me can make that tidyup.
comm_q.close() | ||
comm_q.join_thread() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ought these be in the finally
block? Or otherwise no skipped in the exceptional case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they were not in the finally block in the code this is moved from. I'd prefer to not get into peturbing that shutdown dance here. It is probably the right thing to do though.
This is part of an ongoing project to move monitoring message forwarding into relevant executors, with executors only initializing whichever routers are needed.
In this step of the project:
The ZMQ radio/router is only used by the high throughput executor to receive messages from the interchange. This PR makes the initialization of that look more like other interchange<->submit side ZMQ initialiation, rather than look like monitoring system initialization.
This does not stop other future components which want to use monitoring-over-ZMQ: those components would initialize ZMQ routing themselves.
API-wise:
Parsl executors lose the hub_zmq_port attribute which was configured by the DFK after initialising monitoring.
As a replacement, Parsl executors gain a monitoring_messages attribute, which is the multiprocessing.Queue where executors should send monitoring messages when they have received them from their own components. This queue already existed between monitoring components, but is now exposed as an interface for other Parsl components.
Configuration-wise:
Several ZMQ-related monitoring configuration options now come from HTEX configuration, rather than from MonitoringHub configuration. There are no new configuration options.
Choice of ZMQ port now comes from the same range as other HTEX ZMQ ports, rather than from the port range in MonitoringHub. This means the minimum number of available ports there is now 4, instead of 3.
ZMQ router log files now go into the High Throughput Executor's log directory, usually a subdirectory of the main log directory.
ZMQ log verbosity is configured by High Throughput Executor's worker_debug configuration parameter, rather than by MonitoringHub.monitoring_debug.
The ZMQ router will now listen on the interchange loopback address, configured by HighThroughputExecutor.loopback_address, rather than on all addresses.
Type of change