Skip to content

JOBFS quota needs to be increased #400

@manodeep

Description

@manodeep

A (modified to run on 4-nodes) AMIP config run got killed overnight with the following error message:

laboratory path:  /scratch/tm70/ms2335/access-esm
binary path:  /scratch/tm70/ms2335/access-esm/bin
input path:  /scratch/tm70/ms2335/access-esm/input
work path:  /scratch/tm70/ms2335/access-esm/work
archive path:  /scratch/tm70/ms2335/access-esm/archive
Found experiment archive: /scratch/tm70/ms2335/access-esm/archive/20260123-dev-amip-no-map-by-numa-dev-amip-7b152fbd
nruns: 2 nruns_per_submit: 1 subrun: 1
payu: Found modules in /opt/Modules/v4.3.0
Loading input manifest: manifests/input.yaml
Loading restart manifest: manifests/restart.yaml
Loading exe manifest: manifests/exe.yaml
Setting up atmosphere
Checking exe, input and restart manifests
Updating land use for year 1979
Job 159069506.gadi-pbs killed due to exceeding jobfs quota. Quota: 375.0MB, Used: 1.0GB, Host: gadi-cpu-spr-0145

======================================================================================
                  Resource Usage on 2026-01-23 20:12:22:
   Job Id:             159069506.gadi-pbs
   Project:            tm70
   Exit Status:        271 (Linux Signal 15 SIGTERM Termination)
   Service Units:      653.12
   NCPUs Requested:    416                 CPU Time Used: 310:10:34       
   Memory Requested:   2.0TB                 Memory Used: 130.48GB        
   Walltime Requested: 06:30:00            Walltime Used: 00:47:06        
   JobFS Requested:    1.46GB                 JobFS Used: 1.0GB           
======================================================================================

The released AMIP-config with 2-nodes is (presumably) not affected - noting mostly for myself that if we increase the number of nodes for the AMIP config.

The error is also stochastic - I have successfully finished quite a few other 4-node runs (and not really sure how the jobfs requirement can be stochastic).

The no-risk change would be to increase the current (very low) jobfs quota to something reasonable and avoid such an issue, esp. if the issue occurs randomly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions