-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Hi Michael,
I would like to contribute support for Open Cluster Scheduler (OCS) and Gridware Cluster Scheduler (GCS) to clustermq. Both are successors of Sun Grid Engine (SGE) and Univa Grid Engine (UGE).
-
OCS (Open Source):
https://github.com/hpc-gridware/clusterscheduler -
GCS (Commercial / Enterprise):
https://hpc-gridware.com/
Background
While testing the existing SGE backend, I observed that it does not work out of the box with OCS/GCS and requires adjustments.
We currently have a prospect who intends to use clustermq on both OCS and GCS clusters concurrently. For this reason, we require distinct scheduler templates for each system. I have implemented the necessary changes in a forked repository and created product-specific templates for both OCS and GCS.
Before opening a pull request, I would like to clarify a few implementation questions and discuss design considerations.
Questions
1. Array Job Behavior and Worker Cancellation
In SGE-like systems, workloads are typically scheduled as array jobs. This means that workers belonging to a single Q() call may not start simultaneously. As a result, there can be situations where some workers remain queued even though the workload has already completed.
It appears that remaining queued workers are not automatically canceled when the master process finishes.
- Is there currently a mechanism to cancel remaining queued workers?
- If not, where would be the most appropriate place in the architecture to implement such cancellation logic?
2. The cores Parameter in SGE and PBS Backends
The SGE and PBS backends use a cores parameter to request multiple cores per worker job (possibly via a parallel environment). I would like clarification on:
- When and how is this parameter used?
- Does this relate to MPI-based R workloads?
- Or is it primarily intended to allocate multiple scheduler slots/cores per clustermq worker process?
From my observation, Q() does not expose a cores argument directly. How and where is this parameter expected to be configured?
As I am relatively new to R, I would also appreciate insight into whether there are common R workloads requiring MPI-style parallelism within a clustermq worker, or whether this parameter primarily controls resource allocation.
3. CPU and Memory Binding Features (GCS)
GCS provides enterprise binding features that allow binding jobs to specific hardware units such as:
- Hardware threads
- CPU cores
- Dies
- Sockets
- CPU caches
- NUMA nodes
This enables the scheduler to place jobs on hardware resources sharing cache or memory locality, which can significantly improve performance for certain workloads.
Does clustermq currently provide a way to expose or control scheduler-level binding features?
If not, would it be acceptable to extend Q() with an optional parameter that allows users to specify binding requirements for their jobs?
In some scenarios, assigning one CPU core per worker is not optimal. Binding to a NUMA node or cache domain may provide better performance characteristics, but I am unsure if this also applies to R workloads.
Next Steps
I would be happy to share my fork and proposed changes for review. Before opening a pull request, I would appreciate some guidance.