Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces two new base classes for modular code organization in CHM: data_base for per-module data management with caching capabilities, and base_step for implementing submodule execution steps using the CRTP pattern. The data_base class provides lazy evaluation, automatic cache invalidation based on timesteps, and type-safe output handling.
Key changes:
- New
data_base<CacheType>template class with cache management, lazy evaluation viaupdate_value(), and output accumulation viaset_output() - New
base_step<Derived, Data>CRTP base class enforcingexecute_impl()implementation in derived classes - Comprehensive test suite for
data_basefunctionality including cache initialization, timestep-based invalidation, and default value handling
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| src/modules/submodules/data_base.hpp | Implements the core data_base template class with caching, concepts for type safety, and cache lifecycle management |
| src/modules/submodules/base_step.hpp | Implements CRTP-based base_step class for enforcing submodule execution interface at compile-time |
| src/tests/submoduletests/test_data_base.cpp | Provides comprehensive unit tests for data_base including cache behavior, value updates, and default value verification |
| src/CMakeLists.txt | Adds submodules directory to includes and integrates new test file (though most existing tests are commented out) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Can you comment on the directory structure? Currently the submodule directory is a subdirectory of modules. But it could instead just be in the src directory. As in: src/modules/submodulesvs src/modules and src/submodules |
|
Done + above comment |
|
I thought about converting this to a draft because I think what is here could be better. In particular, functions required by submodules will look like this: double my_module::data::air_temperature() {
update_value(
[this]() -> auto& { return cache_->air_temperature; },
[this]() { return (*face)["air_temperature"_s]; }
);
return cache_->air_temperature;
}which means constructing the lambda on every single call. And it just has bad boiler plate. One idea I have is to find a way to create these lambdas in the data constructor. something like private:
// Store bound functions as members
std::function<double&()> air_temp_cache_getter_;
std::function<double()> air_temp_face_getter_;
std::function<double&()> water_vapour_cache_getter_;
std::function<double()> water_vapour_face_getter_;
public:
data() {
// Initialize getters once in constructor
air_temp_cache_getter_ = [cache_]() -> double& {
return cache->air_temperature;
};
air_temp_face_getter_ = [face]() {
return (*face)["air_temperature"_s];
};so that double my_module::data::air_temperature()
{
return update_value(air_temp_cache_getter_,air_temp_face_getter_);
}where now This the same amount of boilerplate really, but certainly eliminates the repeated creation of lambdas. For now, I think this version is OK and if we notice performance problems then it can be improved. Also worth noting that macros might be useful here since all the functions that use the cache system look the same. |
`base_step` name comes from the idea that submodules are steps in a
module.
Uses CRTP to avoid overhead. Derived classes are defined as follows:
```cpp
template<typename data>
class step : public base_step<step,data>
{
public:
void execute_impl(data& d)
{
// Implementation here
}
}
```
`data` is intended to be the module specific data class derived from
`face_info`.
Derived classses should make use of concepts to enforce getters and
setters on data. Example
```
\#include <concepts>
template<typename T>
concept StepConcept = requires(T& t)
{
{ t.input1() } -> std::floating_point;
{ t.input2() } -> std::integral;
{ t.output1(std::declval<double>()) } -> std::same_as<void>
}
template<StepConcept data>
class step : public base_step<step,data>
{
public:
void execute_impl(data& d)
{
// Implementation here
}
}
```
At the moment, it is recommended to supply everything via functions,
even parameters that are set at run time but never changed.
For parameters that are domain wide one could write a function:
```cpp
const double module::data::foo()
{
static const double result = cfg.get("result");
return result;
}
```
Static ensures that the instance is (a) uniform per instance of `data`,
and (b) set once (thread safe).
It does involve a hidding bool and if check that is evaluated at runtime
on every call.
Parameters that are per-face must have a private variable. Designing
submodules derived from `base_step` with getters and setters provides
flexibility for the writer.
**Constness** `GuaranteeImplementExecute` now accepts `const T&` and `Data&` types. This ensures that (a) `T` has a function `void execute_impl` that takes a `Data&` and (b) `T` can be `const` and or a reference. **Concepts** The previous concept was circular: a derived class would need to be declared like: ```cpp template<typename Data> class Derived : public base_step<Derived,Data> ``` But since `Derived` is not fully defined, it cannot satisfy the concept. Instead, this is replaced with a static assert that is only tripped if `execute` is called. It is still tripped at compile time and therefore has the same effect. **Constructors** The constructor is now explicitly declared and is protected.
Introducing new base class for the data classes owned by each module.
Coordinates per-time-step caching and access to triangle-specific data.
`data_base` is the base class. It expects a single template parameter `CacheType`. `CacheType` is always
derived from `cache_base` and this is enforced by the CacheRules concept.
`CacheType` is unique to each module, and at least should contain a
single variable member for each depends and provides variable defined in
the module constructor. Each is used to store values dereferenced from
face (inputs) or values to be set to `face` later (outputs), and
released at the end of a time-step.
`cache_base` is a very basic struct with one member variable and one static constexpr member function:
`int64_t last_timestep` Initialized as -1. Keeps track of the current
time step. Somewhat deprecated, but originally included to ensure that a
cache would always be reset on a new timestep.
```cpp
template<typename T>
static constexpr T default_value();
```
`default_value<T>()` returns a sentinal value used to initialize
variables on the `CacheType` objected derived from `cache_base`. It sets
floats to `std::numeric_limits<T>::quiet_NaN()` and integral types to
`std::numeric_limits<T>::min()`. The latter type is not the safest at
the moment considering that unsigned ints exist and may need to be
adjusted later. These sentinal values exist so that each member can be
marked as cached (or not cached) separately.
An example of `CacheType` is as follows:
```cpp
class Cache : public cache_base
{
// Has to be NaN does not have to be this version of NaN
double input_variable = default_value<double>();
int input2 = default_value<int>();
// Outputs are stored as 0.0
double output_variable = 0.0;
};
```
`data_base` is written specifically to cater to the `base_step` class. Derived classes of base_step
have a single template parameter to a class/struct to access inputs and parameters. It is intended
that the data class is used in this template.
`data_base` provides tools to write functions to that the `base_step` template expects to pass inputs
and parameters set at runtime.
`data_base` has private, protected and public members.
**Private:**
```cpp
template<typename T>
bool constexpr check_if_set(T t);
```
This `constexpr` function checks if an input is equal to its sentinal
value.
`bool is_stale();`: Checks if `cache_base::last_timestep` is not equal
to `global_param->timestep_counter`. Means that the cache was
(mistakenly) not reset at the end of the last timestep. Typically a
programmer error. May be deprecated but technically works.
`void init_cache();`: Checks if the cache (`cache_` protected member,
see below) has been initialized. If not, it is initialized. After which
all members are set to their sentinal values.
**Protected:**
The constructor is protected so that only derived classes can create
this class.
```cpp
data_base(const mesh_elem& face_in, const boost::shared_ptr<global> param,
const pt::ptree& cfg, const bool istest = false);
```
Accepts three arguments, a `const mesh_elem&`, `const
boost::shared_ptr<global>` and `const pt::ptree&`. There is also a
`const bool` argument that defaults to `false`. This last argument
exists just to make googletests easier without needed to mock the
`mesh_elem` object. Allows `face` and `global_param` to be NULL for
tests.
Each of the objects in the constructor are protected as they are
intended to be used by the derived class for better functions for
submodules.
`mutable std::optional<CacheType> cache_;`: This is the cache object.
Mutable so that it can be modified in `const` methods. Accessing
inputs in submodules is a const-like operation in that the inputs aren't
being modified, but the cache is of course modified on first call.
Before we can discuss these functions, we have to first introduce the
concepts `ValueRules` and `OutputRules<T>`. The former ensures that the
type is (a) callable and (b) produces an l-value reference of the type
that the function outputs. The latter ensures that the type obeys
`ValueRules` AND its output is convertible to `T`.
As we will see below, where `ValueRules` is used, the output type is not
important or part of the function signature. Whereas for where
`OutputRules<T>` is used, the output type `T` is part of the function
signature (and is the type to be output). It is labeled `convertible_to`
as to not require that the user use exactly the same type everywhere and
implicit converstion is OK.
Two functions:
```cpp
template<ValueRules Value,typename Fetch>
void update_value(Value&& value, const Fetch& fetch);
template<typename T,OutputRules<T> Output>
void set_output(Output&& output,const T t);
```
`update_value` receives two arguments. One of type `Value&&` and the
other of type `const Fetch&`. Both are callables but only one has a type
that can be confirmed (I think).
`value` is a callable that returns an l-value reference. Writting it as
`Value&&` means that `value` can be an r-value reference. That means in
the call `update_value`, the first argument can be a `lambda` written in
the function call. See the `input_variable` example at the end.
`fetch` is also capable of holding an r-value reference, `const Fetch&`
binds to everything. However, unlike `value` it does not need to return
an l-value reference. The output of `fetch` is meant to be copied to the
output of `value`. Often, `fetch` will return something like
`(*face)["input"_s]`. It should be an r-value reference.
See the `input_variable` example at the end.
The `update_value` function calls `init_cache()` before doing anything
else. Allocating the cache if it has not been done yet.
`set_output` is a similar function. The first argument is an r-value
reference to a lambda and outputs an l-value reference. The second
argument is a template `const T`. As is typical in CHM, `T` is a
primitive type, typically a `double`. So not making `T` a reference is
acceptable. The `output` object is, like the `value` object, a lambda
returning an l-value reference to a member of `cache_` so that it can be
modified. Therefore, since there is no restriction on `CacheType`
members, it could certainly be another type. If, later, output could
return an `std::vector` or another type of object that is expensive to
copy, this could be modified.
Note that in the `set_output` function there is the following line:
```cpp
output() += t;
```
And note that the outputs are defaulted to 0. What this does is that it
allowed output to be accumulated or assigned many times. Within one
submodule is unnessary flexibility but if a module has several outputs
then this allows it to be changed several times.
Final note: Its possible, especially for state variables in hydrology,
to increase or decrease. This could happen in submodule 1 but not
submodule 2. In which case, getters can be made for output variables
within a submodule. So then a submodule can have the "starting value"
modify that and later "assign" it to a member function of data. However,
that member should **should not** use `set_output` because of the
incrementation, instead it should have its own logic. In addition, state
variables should never be on the cache.
**Examples**
```cpp
class data : public data_base<Cache>
{
public:
// Example of an input/provides
double input_variable();
// Example of domain wide parameter set in the config
double config_parameter();
// Example of parameter access from global instance
int get_dt();
// Example of output function
void output_variable(const double in);
// Example of a parameter that varies on each face
double spatial_parameter();
};
double data::input_variable() const
{
update_value([this]() -> auto& { return cache_->input_variable;},
[this]() { return (*face)["input_variable"_s];} );
return cache_->input_variable;
};
```
input2() would be the same as above (with variables and string changed)
```cpp
double data::config_parameter() const
{
static const double result = cfg_.get("config_parameter",1.0);
return result;
};
double data::spatial_parameter() const
{
update_value([this]() -> auto& { return cache_->spatial_parameter; },
[this]() -> { return face->veg_parameter("spatial_parameter"_s); } );
return cache_->spatial_parameter;
};
int data::get_dt() const
{
// Don't actually need to do
static const double dt = global_param->dt();
return dt;
};
void data::output_variables(const double out)
{
set_output([this]() -> auto& { return cache_->output_variable; },out);
};
void module_name::run(mesh_elem& face)
{
// run whatever calculations here, using the data object
// manually reset the cache
d.reset_cache();
};
```
`test_data_base` defines a mock cache struct: `MockCache` and a mock data class: `data`, as well as the test fixture for GoogleTests. 6 tests: 1. CacheInitializesOnFirstUpdate 2. CacheRespectsTimestepChanges 3. ManualCacheResetWorks 4. SetOutputUpdatesValue 5. OnlyUpdatesNaNValues 6. ChecksDefaultValue `src/CMakeLists.txt` defines several test `.cpp` files. At least one is not functional. `test_triangulation.cpp` failed. Disabled the rest until they can be tested. Added `submoduletests/test_data_base.cpp` to the list.
Documentation in `data_base.hpp` fixed to match class. Comment in `test_data_base.cpp` changed to add clarity.
1. OutputRules concept now tests to ensure that the += operator functions. 2. Concepts moved to a namespace to avoid name collision as these templates are included in every TU that includes `data_base`.
A second constructor now exists so that instatiations of a test version of `data_base` must be intentional and only succeeds by setting the flag to `true`.
These instructions walk through a simple example to writing a submodule and adding it to a module.
`data_base` class has a constructor to add a mesh_elem reference, shared_ptr to a `global` object, and a const reference to a `config_file` object. These must be passed through the `make_module_data` constructor and this variadic template allows any number of arguments to be passed.
c2d44c5 to
caa12ac
Compare
New data base class
New base class for the per-module data class:
data_base. Useful for writing modules making use of submodules. Example here usingdata_baseandbase_step: https://github.com/djmallum/CHM/blob/develop/src/modules/surface_temperature.hpp.More details in commit messages.
New submodule base class
base_stepname comes from the idea that submodules are steps in a module.Uses CRTP to avoid v-table overhead. Derived classes are defined as follows:
datais intended to be the module specific data class derived fromface_info.Derived classses should make use of concepts to enforce getters and setters on data. Example
At the moment, it is recommended to supply everything via functions, even parameters that are set at run time but never changed.
For parameters that are domain wide one could write a function:
Static ensures that the instance is (a) uniform per instance of
data, and (b) set once (thread safe).It does involve a hidden bool and if check that is evaluated at runtime on every call.
Parameters that are per-face must have a private variable. Designing submodules derived from
base_stepwith getters and setters provides flexibility for the writer.Future work
Following this, can merge:
base_stepdata_baseto couple the calculations.