-
Notifications
You must be signed in to change notification settings - Fork 219
Description
Compatibility-first change policy
A high-priority goal of this refactor is to preserve current usage patterns,
existing workflows, and current functionality unless change is absolutely
required.
Where possible, the refactor should preserve both:
- interface compatibility
- functional behavior and supported capabilities
If a breaking or behavior-changing modification is unavoidable, it must be
explicitly justified and accompanied by a migration plan.
Refactor
This incremental refactor of CIME to improve:
- dependency injection
- resiliency and fault recovery
- modularity and maintainability
- testability
- compatibility-preserving internal evolution
The proposed package layout uses the following structure as a baseline:
CIME/api/ -> User facing classes e.g. Case, Downloader, etc
CIME/cli/
CIME/core/
CIME/core/build/
CIME/core/baseline/
CIME/core/batch/
CIME/core/config/
CIME/core/compare/
CIME/core/downloader/
CIME/core/locking/
CIME/core/mods/
CIME/core/namelist/
CIME/core/status/
CIME/core/timing/
CIME/core/tools/
CIME/core/xml/
CIME/data/
CIME/non_py
CIME/build_scripts
CIME/SystemsTests/
CIME/tests/
This layout is a starting point, not a hard limit. Additional directories under
CIME/core/ may be added as needed.
The refactor is explicitly designed to preserve compatibility for:
- symlinked tools created in case directories
- external scripts using
build_scripts - external models that integrate with CIME, including:
- E3SM (
E3SM-Project/E3SM) - CESM (
ESCOMP/CESM) - NorESM (
NorESMhub/NorESM)
- E3SM (
Constraints and assumptions
CIME is not currently an installed package
Today, CIME is not generally consumed as an installed Python package. Instead,
cases create symlinked tools in the case directory, and those tools must modify
sys.path so they can import CIME.
This behavior must continue to work until a packaging transition is actually
adopted. This RFC does not assume immediate installed-package semantics.
CLI is optional
A CIME/cli/ layer is allowed in the design, but implementing it is optional.
The architecture should support it, not require it.
build_scripts must remain
There are external scripts that use build_scripts. The directory must remain
as a compatibility surface.
Logic may move into CIME/core/build/, but CIME/build_scripts must remain as
a stable wrapper or adapter layer.
External model compatibility is a core requirement
CIME is used by external models, notably E3SM, CESM, and NorESM. Internal
refactors must minimize behavioral impact on these downstream consumers.
Baseline layout, not final taxonomy
The provided directory layout is the baseline. Additional CIME/core/*
subpackages may be added if they improve boundaries.
Goals
Primary goals
- Improve dependency injection around side-effecting infrastructure
- Improve resiliency of build, submit, run, and recovery workflows
- Reduce hidden global state and implicit runtime coupling
- Preserve external-facing behavior where possible
- Separate user-facing API from core orchestration logic
- Make current non-installed usage safer and more structured
Secondary goals
- Prepare for a future installed-package model without requiring it now
- Keep symlinked tool behavior working during transition
- Preserve
build_scriptscompatibility for outside consumers - Make integration boundaries clearer for external models
Non-goals
This RFC does not require:
- immediate conversion to an installed package
- immediate introduction of a CLI layer
- removal of symlinked case tools
- breaking changes to
build_scripts - breaking changes for external model integrations
- wholesale rewrite of CIME internals in a single step
Compatibility-first principles
Preserve external model behavior
External models such as E3SM, CESM, and NorESM are first-class
compatibility targets.
Changes should preserve current invocation patterns unless a migration path is
explicitly provided.
Preserve symlinked case tools
Case-created symlinked tools must continue to run even when CIME is not
installed.
Preserve build_scripts entrypoints
CIME/build_scripts remains externally visible and stable, even if its logic is
internally delegated.
Internal evolution behind stable surfaces
Move implementation into CIME/core/*, but keep stable wrappers in:
CIME/api/CIME/build_scripts- legacy tool locations as needed
Architectural model
Baseline layout
CIME/
api/
cli/ # optional
core/
build/
baseline/
batch/
config/
compare/
downloader/
locking/
mods/
namelist/
status/
timing/
tools/
xml/
... # additional core dirs allowed
data/
non_py/
build_scripts/
SystemsTests/
tests/
Recommended additional core/ directories
Not required immediately, but likely useful:
CIME/core/case/CIME/core/runtime/CIME/core/plugins/CIME/core/logging/
These help avoid turning existing buckets like tools/ or config/ into
miscellaneous catch-alls.
User-facing and compatibility layers
CIME/api/
Purpose:
- stable user-facing classes
- preserve conceptual public API
- delegate to core services
Expected examples:
CaseDownloader
CIME/api/ should contain facade classes, not core workflow logic.
CIME/build_scripts/
Purpose:
- preserve compatibility for outside scripts
- remain importable or invokable by current consumers
- delegate implementation to
CIME/core/build/
This directory stays in place even if most logic moves elsewhere.
Optional CIME/cli/
Purpose:
- possible future thin entrypoints
- not required by this RFC
If implemented, it should remain thin:
- parse args
- compose services
- call use cases
- map exceptions to exit codes
Managing the non-installed package model
This is one of the most important parts of the RFC.
Current reality
Symlinked tools inside a case directory need to locate the CIME source tree and
mutate sys.path in order to import CIME.
This is currently necessary and must continue to work until CIME is packaged and
installed consistently.
Problem
Today, sys.path manipulation appears in multiple places and is mixed with
general-purpose logic, which creates:
- inconsistent import behavior
- process-global mutation
- brittle assumptions about current working directory and source layout
- difficult debugging for external consumers
Proposed compatibility design
Instead of eliminating path setup immediately, centralize it.
The canonical location for this logic will be:
CIME/core/config/bootstrap.py
Responsibilities of the bootstrap layer:
- determine CIME source root from:
- symlink target location
- script location
- case metadata
- known relative layouts
- add only the minimum required entries to
sys.path - do so in one controlled place
- expose a single helper for legacy scripts and symlinked tools
Conceptual API:
def ensure_cime_on_path(script_path=None, case_root=None):
"""
Ensure the source-tree CIME package is importable in a non-installed setup.
Returns the resolved CIME root.
"""Design rule
Until CIME is packaged and installed consistently, controlled path bootstraps
are acceptable, but ad hoc sys.path manipulation scattered through the
codebase is not.
All new bootstrap or path-resolution logic should go through
CIME/core/config/bootstrap.py.
Long-term transition path
When or if CIME becomes installable:
- bootstrap first checks normal import
- falls back to source-tree path bootstrap
- eventually source-tree bootstrap can be deprecated
This provides a clean migration path without forcing it now.
Dependency injection design
DI strategy
Use lightweight dependency injection:
- constructor injection
- explicit factories
- composition at entrypoints and facades
- no framework required
Injectable boundaries
Inject only side-effecting or environment-specific services:
- filesystem
- subprocess or process execution
- environment variables
- XML store
- scheduler integration
- clock or time
- bootstrap or path resolver
- plugin or customization loading
Example interfaces
from dataclasses import dataclass
from pathlib import Path
from typing import Mapping, Protocol, Sequence
@dataclass(frozen=True)
class CommandResult:
returncode: int
stdout: str
stderr: str
class FileSystem(Protocol):
def exists(self, path: Path) -> bool: ...
def is_dir(self, path: Path) -> bool: ...
def mkdir(self, path: Path, parents: bool = False) -> None: ...
def read_text(self, path: Path) -> str: ...
def write_text(self, path: Path, content: str) -> None: ...
class ProcessRunner(Protocol):
def run(
self,
argv: Sequence[str],
cwd: Path | None = None,
env: Mapping[str, str] | None = None,
timeout: int | None = None,
) -> CommandResult: ...
class EnvironmentProvider(Protocol):
def get(self, key: str, default: str | None = None) -> str | None: ...
class Clock(Protocol):
def now_iso(self) -> str: ...Core subsystem guidance
CIME/core/build/
Owns:
- build planning
- build orchestration
- build dependency logic
Externally visible wrappers remain in CIME/build_scripts.
CIME/core/batch/
Owns:
- submit and resubmit behavior
- scheduler abstraction
- batch dependency rules
- recovery-aware submission flow
CIME/core/config/
Owns:
- typed config models
- runtime config loading
- customization coordination
- bootstrap for non-installed source-tree execution
CIME/core/xml/
Owns:
- XML store abstraction
- caching policy
- schema validation
- domain mapping from XML to typed objects
CIME/core/status/
Owns:
- workflow state tracking
- case status writes
- structured events
- state transition rules
CIME/core/locking/
Owns:
- lock-file policy
- mutation guards
- concurrency-sensitive file handling
CIME/core/tools/
Owns:
- shared tool support logic only
It should not become a generic dumping ground.
Case refactor strategy
Preserve public role
Case remains the primary user-facing class for now.
Change implementation role
Internal logic may be extracted from Case incrementally so that Case
increasingly serves as a compatibility facade over core services.
Compatibility requirement
Existing external code that imports or uses Case should continue to work with
minimal or no changes wherever feasible.
Physical move timing
Case should remain in its current location until the final stage of migration.
A physical move of Case to a new package location, if still desired at that
point, should occur only as part of one coordinated downstream model migration
across supported consumers.
This avoids repeated churn for downstream integrations and keeps the current
import surface stable during the refactor.
Resiliency design
Typed errors
Replace broad generic fatal control flow with typed exceptions.
Example hierarchy:
class CimeError(Exception):
pass
class ConfigurationError(CimeError):
pass
class ValidationError(CimeError):
pass
class FilesystemError(CimeError):
pass
class ExternalCommandError(CimeError):
pass
class RetryableExternalCommandError(ExternalCommandError):
pass
class LockingError(CimeError):
pass
class StateTransitionError(CimeError):
passRetry and recovery policy
Move retry and recovery policy into explicit services:
FailureClassifierRetryPolicyRecoveryService
Explicit workflow states
For critical operations like submit, run, and archive, define explicit states
such as:
- READY
- VALIDATING
- SUBMITTING
- RUNNING
- FAILED_RETRYABLE
- RECOVERING
- SUCCEEDED
- FAILED_FINAL
Idempotency
Prioritize idempotency for:
- lock and unlock operations
- status updates
- regeneration steps
- archive and restore steps
- submit and resubmit transitions
External model compatibility plan
CIME compatibility must be evaluated against at least these active downstream
model integrations:
- E3SM (
E3SM-Project/E3SM) - CESM (
ESCOMP/CESM) - NorESM (
NorESMhub/NorESM)
Why this is a hard constraint
These downstream models rely on CIME-centered case and workflow behavior.
Refactors that change source-tree execution, script invocation, bootstrap
behavior, Case, or build_scripts can affect their workflows.
Compatibility rules
- preserve current import surfaces where possible
- preserve case-created symlink tool behavior
- preserve
build_scripts - preserve existing script entrypoints or provide stable wrappers
- avoid requiring downstream models to adopt installed-package semantics
- validate major refactors against representative downstream integration paths
from E3SM, CESM, and NorESM
build_scripts compatibility surface
For build_scripts, the following are part of the strict compatibility surface:
- command names
- invocation patterns
- accepted arguments and options
- argument semantics
- user-visible parsing behavior
- argument-handling behavior
- exit behavior tied to argument handling
Implementation beneath that layer may move into CIME/core/build/ or other core
packages.
Downstream regression checkpoints
Major refactors should be validated against representative downstream workflows
for E3SM, CESM, and NorESM.
At minimum, regression checkpoints should include:
- source-tree CIME usage
- case creation workflow
- symlinked case-tool execution
- standard setup, build, and run path
Cross-model checkpoint categories
For all three downstream models, validate:
- import and bootstrap behavior in non-installed mode
- user-facing script invocation compatibility
Casecompatibility at current import locationbuild_scriptsargument parsing compatibility- no unexpected changes in standard case lifecycle sequencing
This should be treated as a migration validation requirement for major refactor
milestones.
Proposed migration phases
Phase 1 — Compatibility bootstrap and seams
- create centralized source-tree bootstrap for non-installed imports
- add
FileSystem,ProcessRunner,EnvironmentProvider,Clock - leave behavior unchanged
Phase 2 — Error model and status layer
- add typed exceptions
- add workflow and status services
- improve failure classification in high-value paths
Phase 3 — Build and batch extraction
- move build logic into
CIME/core/build/ - keep
CIME/build_scriptsas compatibility wrapper layer - move submit logic into
CIME/core/batch/
Phase 4 — Case internal extraction
- continue extracting logic behind
Case - preserve current import location and compatibility methods
- delegate to core services
Phase 5 — XML and config extraction
- create
CIME/core/xml/and typed config loading - reduce process-global XML caching behavior
Phase 6 — Optional CLI
- implement
CIME/cli/only if desired - keep it thin and optional
Phase 7 — Coordinated downstream migration
- perform one coordinated downstream model migration if needed
- consider moving
Caseonly at this stage - support installed-package mode when available
- keep source-tree bootstrap fallback until migration is complete
Revised target tree
CIME/
api/
downloader.py
cli/ # optional
core/
build/
baseline/
batch/
config/
bootstrap.py
compare/
downloader/
locking/
mods/
namelist/
status/
timing/
tools/
xml/
case/ # optional addition
runtime/ # optional addition
plugins/ # optional addition
data/
non_py/
build_scripts/
SystemsTests/
tests/
Recommended first implementation slice
Given these constraints, the recommended first slice is:
Slice 1
- add centralized source-tree bootstrap helper for symlinked tools
- add typed exception hierarchy
- add
ProcessRunnerabstraction - add
FileSystemabstraction
Slice 2
- move batch and submit logic into
CIME/core/batch/ - keep current entrypoints and wrappers intact
Slice 3
- move build logic into
CIME/core/build/ - keep
CIME/build_scriptsstable and delegating
Slice 4
- begin extracting logic behind
Case - preserve external consumer behavior