Use k8s volcano replicas to shrink job manifest size

## Description
Using `replicas` for repetitive pod configuration in `kubernetes_scheduler` has been removed in https://github.com/pytorch/torchx/commit/f6907e8c089208545b95f1b8967278e399006a47

The rationale is [here](https://github.com/pytorch/torchx/blame/41be1d8e97825151482323faabf5cdfcdd00f973/torchx/schedulers/kubernetes_scheduler.py#L378-L380)

Unfortunately for a large setup we can easily breach default limits, 1.5Mb: `etcdserver: request is too large`
It's not always possible to bump `max-request-bytes`, e.g. for AWS EKS.

Currently both job-specific and even TorchX own environment variables are contributing to breaching this limit.

We would like to find a way to make replicas work to minimize job manifest size.

## Motivation/Background
Increase the maximum cluster size we can support with k8s

## Detailed Proposal
E.g. using ConfigMap with per node/role config or Downward API. Make use of the fact we have roles with many replicas that share a huge chunk of their configuration.

## Alternatives
Don't use environment variables and long names anywhere in the configuration, still the limit will be significantly smaller than when using replicas on average.

## Additional context/links
* https://github.com/pytorch/torchx/commit/f6907e8c089208545b95f1b8967278e399006a47
* https://github.com/pytorch/torchx/blame/41be1d8e97825151482323faabf5cdfcdd00f973/torchx/schedulers/kubernetes_scheduler.py#L378-L380
* https://kubernetes.io/docs/concepts/configuration/configmap
* https://kubernetes.io/docs/concepts/workloads/pods/downward-api/


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use k8s volcano replicas to shrink job manifest size #1054

Description

Motivation/Background

Detailed Proposal

Alternatives

Additional context/links

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use k8s volcano replicas to shrink job manifest size #1054

Description

Description

Motivation/Background

Detailed Proposal

Alternatives

Additional context/links

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions