Skip to content

schema - optional with defaults - improved behaviour (not injecting the default in the index)#246

Open
danovaro wants to merge 2 commits intodevelopfrom
feature/optional-default
Open

schema - optional with defaults - improved behaviour (not injecting the default in the index)#246
danovaro wants to merge 2 commits intodevelopfrom
feature/optional-default

Conversation

@danovaro
Copy link
Member

@danovaro danovaro commented Mar 7, 2026

…e index

Description

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

🌈🌦️📖🚧 Documentation FDB 🚧📖🌦️🌈
https://sites.ecmwf.int/docs/dev-section/fdb/pull-requests/PR-246

@codecov-commenter
Copy link

codecov-commenter commented Mar 7, 2026

Codecov Report

❌ Patch coverage is 97.95918% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.40%. Comparing base (2a5f6dd) to head (3c00e56).

Files with missing lines Patch % Lines
src/fdb5/rules/Rule.cc 91.66% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #246      +/-   ##
===========================================
+ Coverage    71.20%   71.40%   +0.20%     
===========================================
  Files          376      375       -1     
  Lines        23762    23732      -30     
  Branches      2479     2477       -2     
===========================================
+ Hits         16920    16947      +27     
+ Misses        6842     6785      -57     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@danovaro danovaro force-pushed the feature/optional-default branch 3 times, most recently from 9e31c06 to 9dbdbf6 Compare March 9, 2026 22:40
@danovaro danovaro changed the title in case of optional keyword, with a default, do not inject it into th… schema - optional with defaults - improved behaviour (not injecting the default in the index) Mar 11, 2026
@danovaro danovaro force-pushed the feature/optional-default branch from ea5454d to d2eb5f4 Compare March 11, 2026 09:31
@danovaro danovaro requested review from Copilot and simondsmart March 11, 2026 09:32
@danovaro danovaro marked this pull request as ready for review March 11, 2026 09:34
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts FDB schema/rule handling for optional attributes with defaults so that default values are not implicitly injected into index keys/axes, and adds/updates tests to validate the new behavior (including a new timespan test suite).

Changes:

  • Update rule matching/key expansion logic for optional predicates with defaults, and adjust optional matcher behavior.
  • Suppress “empty-only” axes entries in fdb-axes output (text and JSON) and update expectations accordingly.
  • Add a new timespan test suite with multiple schema layouts to validate indexing/listing/retrieval behavior.

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/fdb/tools/fdb_axes.sh.in Updates expected fdb-axes outputs to reflect omission of empty-only axes.
tests/fdb/timespan/timespan.sh.in New integration test script exercising timespan indexing/retrieval under multiple schema layouts.
tests/fdb/timespan/schema1 New test schema variant (timespan default at top level).
tests/fdb/timespan/schema2 New test schema variant (timespan default at second level).
tests/fdb/timespan/schema3 New test schema variant (timespan default at index level).
tests/fdb/timespan/req New sample retrieve request for timespan combinations.
tests/fdb/timespan/local1.yaml New local config referencing schema1.
tests/fdb/timespan/local2.yaml New local config referencing schema2.
tests/fdb/timespan/local3.yaml New local config referencing schema3.
tests/fdb/timespan/in.grib New GRIB test input data for timespan tests.
tests/fdb/timespan/CMakeLists.txt Registers the new timespan test with CTest/ecbuild.
tests/fdb/test_service.cc Modernizes loops/assertions and adjusts parameter strings in tests.
tests/fdb/etc/fdb/schema Updates the test schema content/types and adds timespan-related rules/keywords.
tests/fdb/daos/test_daos_catalogue.cc Adjusts MarsRequest creation via Key::request() default verb.
tests/fdb/api/test_select.cc Builds requests via MarsParser and ensures inspect requests are non-empty.
tests/fdb/api/test_fdb_c.cc Adjusts metadata iteration expectations to skip empty optional metadata entries.
tests/fdb/api/test_dist.cc Ensures inspect request has a verb (MarsRequest{\"retrieve\"}).
tests/fdb/api/test_callback.cc Updates archived type used in callback test.
tests/fdb/api/test_auxiliary.cc Updates archived type used across auxiliary/wipe tests.
tests/fdb/CMakeLists.txt Adds the new timespan subdirectory to the test build.
src/fdb5/tools/compare/grib/CompareBitwise.cc Removes a leftover commented debug print.
src/fdb5/tools/compare/fdb-compare.cc Removes an unused member variable.
src/fdb5/rules/Rule.cc Updates optional/default handling in key matching and key expansion logic.
src/fdb5/rules/MatchOptional.cc Changes missing optional value lookup to return empty rather than the default.
src/fdb5/database/IndexAxis.cc Skips emitting axes that only contain a single empty value (text/JSON).
src/fdb5/api/FDB.cc Filters out postproc/sink parameters before inspecting (index-relevant request only).
src/fdb5/CMakeLists.txt Comments out building fdb-patch under HAVE_GRIB.
Comments suppressed due to low confidence (2)

src/fdb5/rules/Rule.cc:229

  • In findMatchingKey(const eckit::StringList& values), the assertion now allows values.size() to be smaller than predicates_.size() (when optionals exist), but the loop still does values.at(i) for every predicate. If the fingerprint/tokenizer drops trailing empty fields, this can throw std::out_of_range instead of treating missing optional values as empty/default. Consider iterating values and predicates together and, when values runs out, only accepting remaining predicates if they are optional (using an empty string/default as appropriate).
    size_t numPred = predicates_.size();
    for (const auto& p : predicates_) {
        if (p->optional()) {
            numPred--;
        }
    }
    ASSERT(values.size() >= numPred);

    TypedKey key(registry_);

    for (auto iter = predicates_.begin(); iter != predicates_.end(); ++iter) {
        const auto& pred = *iter;

        const auto& keyword = pred->keyword();

        /// @note 1-1 order between predicates and values
        const auto& value = values.at(iter - predicates_.begin());

        if (!pred->match(value)) {

src/fdb5/rules/Rule.cc:438

  • tryFill() now asserts values.size() >= numPred (excluding optional predicates), but the fill loop still expects all predicates to be consumed (it_pred == end). If the serialized fingerprint omits trailing empty optional fields, tryFill() will return false (and fill() will ASSERT) even though the remaining predicates are optional. Adjust the termination/consumption checks so that missing trailing optional values are accepted, and make sure the mapping between values and predicates stays aligned.
    size_t numPred = predicates_.size();
    for (const auto& p : predicates_) {
        if (p->optional()) {
            numPred--;
        }
    }
    ASSERT(values.size() >= numPred);  // Should be equal, except for quantile (FDB-103)
    ASSERT(values.size() <= predicates_.size() + 1);

    auto it_value = values.begin();
    auto it_pred = predicates_.begin();

    for (; it_pred != predicates_.end() && it_value != values.end(); ++it_pred, ++it_value) {

        if (values.size() == (predicates_.size() + 1) && (*it_pred)->keyword() == "quantile") {
            std::string actualQuantile = *it_value;
            ++it_value;
            if (it_value == values.end()) {
                return false;
            }
            actualQuantile += std::string(":") + (*it_value);
            (*it_pred)->fill(key, actualQuantile);
        }
        else {
            (*it_pred)->fill(key, *it_value);
        }
    }

    // Check that everything is exactly consumed
    if (it_value != values.end()) {
        return false;
    }
    if (it_pred != predicates_.end()) {
        return false;
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


# * Attributes can be typed
# Globally, at the beginning of this file:
# Globally, at the begining of this file:
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "begining" should be "beginning" in this comment.

Suggested change
# Globally, at the begining of this file:
# Globally, at the beginning of this file:

Copilot uses AI. Check for mistakes.
[ step, param, frequency?, direction? ]]
[ type=me, levtype=ml, timespan?none
[ step?, number-, levelist, param, coeffindex? ]]
[ type=4i, levtype=ml, tmespan?
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This schema line uses tmespan?, which looks like a typo for timespan? (and likely intended to mirror the surrounding timespan?none usage). As written, it will introduce a separate tmespan keyword into the schema, which can break rule matching and indexing for this branch.

Suggested change
[ type=4i, levtype=ml, tmespan?
[ type=4i, levtype=ml, timespan?none

Copilot uses AI. Check for mistakes.
[ step, param, frequency?, direction? ]]
[ type=me, levtype=ml, timespan?none
[ step?, number-, levelist, param, coeffindex? ]]
[ type=4i, levtype=ml, tmespan?
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This schema line uses tmespan?, which looks like a typo for timespan? (and likely intended to be timespan?none like the other branches). As written, it defines a different keyword and will prevent the timespan rules from applying as expected.

Suggested change
[ type=4i, levtype=ml, tmespan?
[ type=4i, levtype=ml, timespan?none

Copilot uses AI. Check for mistakes.
Comment on lines 475 to 479
if ( HAVE_GRIB )
list( APPEND fdb5_tools
fdb-hammer
fdb-patch
# fdb-patch
)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fdb-patch has been commented out from the list of built tools under HAVE_GRIB. If this is not intentional, it will remove the executable from builds/installs. If it is intentional, it would help to add a short comment explaining why the tool is being disabled (and ideally gate it behind an option) to avoid an unexpected feature regression.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +53
int err;
v = nullptr;
// skip empty values (optional metadata)
while (v == nullptr || strlen(v) == 0) {
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop that skips empty optional metadata values can become infinite if fdb_splitkey_next_metadata() returns FDB_ITERATION_COMPLETE (or any error) without setting v to a non-empty string. It also ignores err until after the loop. Consider checking err on each iteration and breaking/failing on non-FDB_SUCCESS, and/or bounding the loop so iteration-complete cannot spin forever.

Suggested change
int err;
v = nullptr;
// skip empty values (optional metadata)
while (v == nullptr || strlen(v) == 0) {
int err = FDB_SUCCESS;
v = nullptr;
// skip empty values (optional metadata)
while ((v == nullptr || strlen(v) == 0) && err == FDB_SUCCESS) {

Copilot uses AI. Check for mistakes.
# * Attributes can be optional
# [ step, levelist?, param ]
# They will be replaced internally by an empty value. It is also possible to provide a default substitution value: e.g. [domain?g] will consider the domain to be 'g' if missing.
# They will be replaced internally by an empty value. It is also posiible to provide a default subtitution value: e.g. [domain?g] will consider the domain to be 'g' if missing.
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "posiible" should be "possible" in this comment.

Suggested change
# They will be replaced internally by an empty value. It is also posiible to provide a default subtitution value: e.g. [domain?g] will consider the domain to be 'g' if missing.
# They will be replaced internally by an empty value. It is also possible to provide a default subtitution value: e.g. [domain?g] will consider the domain to be 'g' if missing.

Copilot uses AI. Check for mistakes.
# * Attributes can be optional
# [ step, levelist?, param ]
# They will be replaced internally by an empty value. It is also possible to provide a default substitution value: e.g. [domain?g] will consider the domain to be 'g' if missing.
# They will be replaced internally by an empty value. It is also posiible to provide a default subtitution value: e.g. [domain?g] will consider the domain to be 'g' if missing.
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "subtitution" should be "substitution" in this comment.

Suggested change
# They will be replaced internally by an empty value. It is also posiible to provide a default subtitution value: e.g. [domain?g] will consider the domain to be 'g' if missing.
# They will be replaced internally by an empty value. It is also possible to provide a default substitution value: e.g. [domain?g] will consider the domain to be 'g' if missing.

Copilot uses AI. Check for mistakes.
Comment on lines 296 to 321
std::vector<Key> Rule::findMatchingKeys(const metkit::mars::MarsRequest& request) const {

RuleGraph graph;

for (const auto& pred : predicates_) {

const auto& keyword = pred->keyword();
const auto& type = registry_.lookupType(keyword);


const auto& values = pred->values(request);
auto values = pred->values(request);

/// @note do we want to allow empty values?
// if (values.empty() && pred->optional()) { values.push_back(pred->defaultValue()); }
/// if the request does not have the keyword, but the predicate is optional, then use the default value
if (values.empty() && pred->optional()) {
values.push_back(pred->defaultValue());
}

auto& node = graph.push(keyword);

for (const auto& value : values) {
if (pred->match(value)) {
node.emplace_back(value);
}
}

/// if there are no matching values for this predicate, then there are no matching keys for the rule
if (node.empty()) {
return {};
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findMatchingKeys(const MarsRequest&) uses pred->values(request) directly. For optional-with-default predicates, MatchOptional::values() returns the default when the keyword is missing, so this path never considers the empty-string alternative. That can prevent tools that rely on Schema::expandDatabase() (e.g. fdb-root) from matching databases indexed with an omitted optional (stored as empty) when the request omits the keyword. Align this implementation with the ReadVisitor overload: explicitly detect missing keywords and add both defaultValue() and "" (when appropriate) to the candidate values.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants