Add OCS and GCS Backend Support to clustermq

Hi Michael,

I would like to contribute support for **Open Cluster Scheduler (OCS)** and **Gridware Cluster Scheduler (GCS)** to *clustermq*. Both are successors of Sun Grid Engine (SGE) and Univa Grid Engine (UGE).

- **OCS** (Open Source):  
  https://github.com/hpc-gridware/clusterscheduler

- **GCS** (Commercial / Enterprise):  
  https://hpc-gridware.com/

## Background

While testing the existing SGE backend, I observed that it does not work out of the box with OCS/GCS and requires adjustments.

We currently have a prospect who intends to use *clustermq* on both OCS and GCS clusters concurrently. For this reason, we require distinct scheduler templates for each system. I have implemented the necessary changes in a forked repository and created product-specific templates for both OCS and GCS.

Before opening a pull request, I would like to clarify a few implementation questions and discuss design considerations.

## Questions

### 1. Array Job Behavior and Worker Cancellation

In SGE-like systems, workloads are typically scheduled as array jobs. This means that workers belonging to a single `Q()` call may not start simultaneously. As a result, there can be situations where some workers remain queued even though the workload has already completed.

It appears that remaining queued workers are not automatically canceled when the master process finishes.

- Is there currently a mechanism to cancel remaining queued workers?
- If not, where would be the most appropriate place in the architecture to implement such cancellation logic?

### 2. The `cores` Parameter in SGE and PBS Backends

The SGE and PBS backends use a `cores` parameter to request multiple cores per worker job (possibly via a parallel environment). I would like clarification on:

- When and how is this parameter used?
- Does this relate to MPI-based R workloads?
- Or is it primarily intended to allocate multiple scheduler slots/cores per clustermq worker process?

From my observation, `Q()` does not expose a `cores` argument directly. How and where is this parameter expected to be configured?

As I am relatively new to R, I would also appreciate insight into whether there are common R workloads requiring MPI-style parallelism within a clustermq worker, or whether this parameter primarily controls resource allocation.

### 3. CPU and Memory Binding Features (GCS)

GCS provides enterprise binding features that allow binding jobs to specific hardware units such as:

- Hardware threads
- CPU cores
- Dies
- Sockets
- CPU caches
- NUMA nodes

This enables the scheduler to place jobs on hardware resources sharing cache or memory locality, which can significantly improve performance for certain workloads.

Does *clustermq* currently provide a way to expose or control scheduler-level binding features?

If not, would it be acceptable to extend `Q()` with an optional parameter that allows users to specify binding requirements for their jobs?

In some scenarios, assigning one CPU core per worker is not optimal. Binding to a NUMA node or cache domain may provide better performance characteristics, but I am unsure if this also applies to R workloads.

## Next Steps

I would be happy to share my fork and proposed changes for review. Before opening a pull request, I would appreciate some guidance.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OCS and GCS Backend Support to clustermq #341

Background

Questions

1. Array Job Behavior and Worker Cancellation

2. The `cores` Parameter in SGE and PBS Backends

3. CPU and Memory Binding Features (GCS)

Next Steps

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add OCS and GCS Backend Support to clustermq #341

Description

Background

Questions

1. Array Job Behavior and Worker Cancellation

2. The cores Parameter in SGE and PBS Backends

3. CPU and Memory Binding Features (GCS)

Next Steps

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

2. The `cores` Parameter in SGE and PBS Backends