Skip to content

MEORG R Environment #51

@bschroeter

Description

@bschroeter

Following discussions with @atteggiani this morning, we have need of a dedicated R module environment for the modelevaluation.org analysis package.

Background

ModelEvaluation.org is a server application hosted by UNSW which accepts model output (i.e. from CABLE), runs analyses, and generates standard plots for model intercomparison and benchmarking. This is currently facilitated by 2 ACCESS-NRI middleware packages:

  1. https://github.com/CABLE-LSM/benchcab, a package and compiles and runs CABLE across branches.
  2. https://github.com/CABLE-LSM/meorg_client, a REST client that interacts with the ModelEvaluation.org API to upload module output and trigger remote analyses.

The bottleneck we are facing is that the analyses are run on dedicated workers on a remote VM, which as the service increases in use will become hamstrung by resource flexibility limitations.

As such, there is a package of work underway to port the analysis software of MEORG to Gadi to run on the queue with flexible resource allocation facilitated by https://github.com/ACCESS-NRI/hpcpy (yet another ACCESS-NRI package).

The analysis software itself is a custom R package (called PalsR) with specific dependencies, listed below, which we would like to have pre-installed and available for use via a simple module load.

module use /g/data/vk83/modules
module load palsr/latest

The intention is to use HPCpy to facilitate the submission of queued jobs, within those jobs we will simply load the module and trigger a local analysis. Then, we will upload (via the MEORG client) the analysis results to modelevaluation.org. This will vastly reduce the server traffic, increase flexibility of resource allocation (due to jobs of varying size) and allow us to make use of the Gadi filesystem (where the files already reside after having been run with benchcab).

Based on the aforementioned discussion, it seems that setting up a dedicated module environment using this infrastructure makes the most sense, as R is largely installable via Conda.

Requirements

  • R
  • R-devel or equivalent R build tooling
  • make and compiler toolchain
  • NetCDF and HDF5 development libraries
  • zlib development library
  • Git
  • R packages: ncdf4, RJSONIO, plotrix, colorRamps
  • Recommended extras: colorspace, scales
  • And of course, the PalsR package itself, hosted on an external DVCS maintained by UNSW

UNSW has provided some installation instructions for their software (largely within a docker config), which we should be able to reverse engineer to set up a dedicated R environment to run the package.

Let's discuss further details in this issue, I would appreciate some direction on how to get started using the examples provided in this repo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions