Skip to content

New Glacier Module#164

Draft
djmallum wants to merge 28 commits intoChrismarsh:developfrom
djmallum:new_module/glacier
Draft

New Glacier Module#164
djmallum wants to merge 28 commits intoChrismarsh:developfrom
djmallum:new_module/glacier

Conversation

@djmallum
Copy link
Copy Markdown
Contributor

Draft of PR for new glacier module as part of the contract with ECCC. Based on PR #155.

TODO:

  • remove unit testing commits and changes
  • Make a clear commit history with better commit names

djmallum and others added 28 commits March 30, 2026 11:20
json, mesh and param file added that was used for unit testing with
point_mode
`base_step` name comes from the idea that submodules are steps in a
module.

Uses CRTP to avoid overhead. Derived classes are defined as follows:

```cpp
template<typename data>
class step : public base_step<step,data>
{
public:
    void execute_impl(data& d)
    {
        // Implementation here
    }
}
```

`data` is intended to be the module specific data class derived from
`face_info`.

Derived classses should make use of concepts to enforce getters and
setters on data. Example

```
\#include <concepts>
template<typename T>
concept StepConcept = requires(T& t)
{
    { t.input1() } -> std::floating_point;
    { t.input2() } -> std::integral;

    { t.output1(std::declval<double>()) } -> std::same_as<void>
}

template<StepConcept data>
class step : public base_step<step,data>
{
public:
    void execute_impl(data& d)
    {
        // Implementation here
    }
}
```

At the moment, it is recommended to supply everything via functions,
even parameters that are set at run time but never changed.

For parameters that are domain wide one could write a function:

```cpp
const double module::data::foo()
{
    static const double result = cfg.get("result");
    return result;
}
```

Static ensures that the instance is (a) uniform per instance of `data`,
and (b) set once (thread safe).

It does involve a hidding bool and if check that is evaluated at runtime
on every call.

Parameters that are per-face must have a private variable. Designing
submodules derived from `base_step` with getters and setters provides
flexibility for the writer.
**Constness**

`GuaranteeImplementExecute` now accepts `const T&` and `Data&` types.
This ensures that (a) `T` has a function `void execute_impl` that
takes a `Data&` and (b) `T` can be `const` and or a reference.

**Concepts**

The previous concept was circular: a derived class would need to be
declared like:

```cpp
template<typename Data>
class Derived : public base_step<Derived,Data>
```

But since `Derived` is not fully defined, it cannot satisfy the concept.
Instead, this is replaced with a static assert that is only tripped if
`execute` is called. It is still tripped at compile time and therefore
has the same effect.

**Constructors**

The constructor is now explicitly declared and is protected.
Introducing new base class for the data classes owned by each module.
Coordinates per-time-step caching and access to triangle-specific data.

`data_base` is the base class. It expects a single template parameter `CacheType`. `CacheType` is always
derived from `cache_base` and this is enforced by the CacheRules concept.

`CacheType` is unique to each module, and at least should contain a
single variable member for each depends and provides variable defined in
the module constructor. Each is used to store values dereferenced from
face (inputs) or values to be set to `face` later (outputs), and
released at the end of a time-step.

`cache_base` is a very basic struct with one member variable and one static constexpr member function:

`int64_t last_timestep` Initialized as -1. Keeps track of the current
time step. Somewhat deprecated, but originally included to ensure that a
cache would always be reset on a new timestep.

```cpp
template<typename T>
static constexpr T default_value();
```

`default_value<T>()` returns a sentinal value used to initialize
variables on the `CacheType` objected derived from `cache_base`. It sets
floats to `std::numeric_limits<T>::quiet_NaN()` and integral types to
`std::numeric_limits<T>::min()`. The latter type is not the safest at
the moment considering that unsigned ints exist and may need to be
adjusted later. These sentinal values exist so that each member can be
marked as cached (or not cached) separately.

An example of `CacheType` is as follows:

```cpp
class Cache : public cache_base
{
    // Has to be NaN does not have to be this version of NaN
    double input_variable = default_value<double>();
    int input2 = default_value<int>();

    // Outputs are stored as 0.0
    double output_variable = 0.0;
};
```

`data_base` is written specifically to cater to the `base_step` class. Derived classes of base_step
have a single template parameter to a class/struct to access inputs and parameters. It is intended
that the data class is used in this template.

`data_base` provides tools to write functions to that the `base_step` template expects to pass inputs
and parameters set at runtime.

`data_base` has private, protected and public members.

**Private:**

```cpp
template<typename T>
bool constexpr check_if_set(T t);
```

This `constexpr` function checks if an input is equal to its sentinal
value.

`bool is_stale();`: Checks if `cache_base::last_timestep` is not equal
to `global_param->timestep_counter`. Means that the cache was
(mistakenly) not reset at the end of the last timestep. Typically a
programmer error. May be deprecated but technically works.

`void init_cache();`: Checks if the cache (`cache_` protected member,
see below) has been initialized. If not, it is initialized. After which
all members are set to their sentinal values.

**Protected:**

The constructor is protected so that only derived classes can create
this class.

```cpp
data_base(const mesh_elem& face_in, const boost::shared_ptr<global> param,
        const pt::ptree& cfg, const bool istest = false);
```

Accepts three arguments, a `const mesh_elem&`, `const
boost::shared_ptr<global>` and `const pt::ptree&`. There is also a
`const bool` argument that defaults to `false`. This last argument
exists just to make googletests easier without needed to mock the
`mesh_elem` object. Allows `face` and `global_param` to be NULL for
tests.

Each of the objects in the constructor are protected as they are
intended to be used by the derived class for better functions for
submodules.

`mutable std::optional<CacheType> cache_;`: This is the cache object.
Mutable so that it can be modified in `const` methods. Accessing
inputs in submodules is a const-like operation in that the inputs aren't
being modified, but the cache is of course modified on first call.

Before we can discuss these functions, we have to first introduce the
concepts `ValueRules` and `OutputRules<T>`. The former ensures that the
type is (a) callable and (b) produces an l-value reference of the type
that the function outputs. The latter ensures that the type obeys
`ValueRules` AND its output is convertible to `T`.

As we will see below, where `ValueRules` is used, the output type is not
important or part of the function signature. Whereas for where
`OutputRules<T>` is used, the output type `T` is part of the function
signature (and is the type to be output). It is labeled `convertible_to`
as to not require that the user use exactly the same type everywhere and
implicit converstion is OK.

Two functions:

```cpp
template<ValueRules Value,typename Fetch>
void update_value(Value&& value, const Fetch& fetch);

template<typename T,OutputRules<T> Output>
void set_output(Output&& output,const T t);
```

`update_value` receives two arguments. One of type `Value&&` and the
other of type `const Fetch&`. Both are callables but only one has a type
that can be confirmed (I think).

`value` is a callable that returns an l-value reference. Writting it as
`Value&&` means that `value` can be an r-value reference. That means in
the call `update_value`, the first argument can be a `lambda` written in
the function call. See the `input_variable` example at the end.

`fetch` is also capable of holding an r-value reference, `const Fetch&`
binds to everything. However, unlike `value` it does not need to return
an l-value reference. The output of `fetch` is meant to be copied to the
output of `value`. Often, `fetch` will return something like
`(*face)["input"_s]`. It should be an r-value reference.
See the `input_variable` example at the end.

The `update_value` function calls `init_cache()` before doing anything
else. Allocating the cache if it has not been done yet.

`set_output` is a similar function. The first argument is an r-value
reference to a lambda and outputs an l-value reference. The second
argument is a template `const T`. As is typical in CHM, `T` is a
primitive type, typically a `double`. So not making `T` a reference is
acceptable. The `output` object is, like the `value` object, a lambda
returning an l-value reference to a member of `cache_` so that it can be
modified. Therefore, since there is no restriction on `CacheType`
members, it could certainly be another type. If, later, output could
return an `std::vector` or another type of object that is expensive to
copy, this could be modified.

Note that in the `set_output` function there is the following line:

```cpp
output() += t;
```

And note that the outputs are defaulted to 0. What this does is that it
allowed output to be accumulated or assigned many times. Within one
submodule is unnessary flexibility but if a module has several outputs
then this allows it to be changed several times.

Final note: Its possible, especially for state variables in hydrology,
to increase or decrease. This could happen in submodule 1 but not
submodule 2. In which case, getters can be made for output variables
within a submodule. So then a submodule can have the "starting value"
modify that and later "assign" it to a member function of data. However,
that member should **should not** use `set_output` because of the
incrementation, instead it should have its own logic. In addition, state
variables should never be on the cache.

**Examples**

```cpp
class data : public data_base<Cache>
{
public:
    // Example of an input/provides
	   double input_variable();
    // Example of domain wide parameter set in the config
	   double config_parameter();
    // Example of parameter access from global instance
	   int get_dt();
    // Example of output function
	   void output_variable(const double in);
        // Example of a parameter that varies on each face
        double spatial_parameter();
};

double data::input_variable() const
{
    update_value([this]() -> auto& { return cache_->input_variable;},
                [this]() { return (*face)["input_variable"_s];} );

    return cache_->input_variable;
};
```

input2() would be the same as above (with variables and string changed)

```cpp
double data::config_parameter() const
{
    static const double result = cfg_.get("config_parameter",1.0);

    return result;
};

double data::spatial_parameter() const
{
    update_value([this]() -> auto& { return cache_->spatial_parameter; },
        [this]() -> { return face->veg_parameter("spatial_parameter"_s); } );
        return cache_->spatial_parameter;
};

int data::get_dt() const
{
    // Don't actually need to do
    static const double dt = global_param->dt();

    return dt;
};

void data::output_variables(const double out)
{
    set_output([this]() -> auto& { return cache_->output_variable; },out);
};

void module_name::run(mesh_elem& face)
{
    // run whatever calculations here, using the data object

    // manually reset the cache
    d.reset_cache();
};
```
`test_data_base` defines a mock cache struct: `MockCache` and a mock
data class: `data`, as well as the test fixture for GoogleTests.

6 tests:

1. CacheInitializesOnFirstUpdate
2. CacheRespectsTimestepChanges
3. ManualCacheResetWorks
4. SetOutputUpdatesValue
5. OnlyUpdatesNaNValues
6. ChecksDefaultValue

`src/CMakeLists.txt` defines several test `.cpp` files. At least one is
not functional. `test_triangulation.cpp` failed. Disabled the rest until
they can be tested. Added `submoduletests/test_data_base.cpp` to the
list.
Documentation in `data_base.hpp` fixed to match class.

Comment in `test_data_base.cpp` changed to add clarity.
1. OutputRules concept now tests to ensure that the += operator
   functions.
2. Concepts moved to a namespace to avoid name collision as these
   templates are included in every TU that includes `data_base`.
A second constructor now exists so that instatiations of a test version
of `data_base` must be intentional and only succeeds by setting the flag
to `true`.
These instructions walk through a simple example to writing a submodule
and adding it to a module.
`data_base` class has a constructor to add a mesh_elem reference,
shared_ptr to a `global` object, and a const reference to a
`config_file` object. These must be passed through the
`make_module_data` constructor and this variadic template allows any
number of arguments to be passed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant