schema - optional with defaults - improved behaviour (not injecting the default in the index)#246
schema - optional with defaults - improved behaviour (not injecting the default in the index)#246
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #246 +/- ##
===========================================
+ Coverage 71.20% 71.40% +0.20%
===========================================
Files 376 375 -1
Lines 23762 23732 -30
Branches 2479 2477 -2
===========================================
+ Hits 16920 16947 +27
+ Misses 6842 6785 -57 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9e31c06 to
9dbdbf6
Compare
…ecting the default in the index)
ea5454d to
d2eb5f4
Compare
There was a problem hiding this comment.
Pull request overview
This PR adjusts FDB schema/rule handling for optional attributes with defaults so that default values are not implicitly injected into index keys/axes, and adds/updates tests to validate the new behavior (including a new timespan test suite).
Changes:
- Update rule matching/key expansion logic for optional predicates with defaults, and adjust optional matcher behavior.
- Suppress “empty-only” axes entries in
fdb-axesoutput (text and JSON) and update expectations accordingly. - Add a new
timespantest suite with multiple schema layouts to validate indexing/listing/retrieval behavior.
Reviewed changes
Copilot reviewed 26 out of 27 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/fdb/tools/fdb_axes.sh.in | Updates expected fdb-axes outputs to reflect omission of empty-only axes. |
| tests/fdb/timespan/timespan.sh.in | New integration test script exercising timespan indexing/retrieval under multiple schema layouts. |
| tests/fdb/timespan/schema1 | New test schema variant (timespan default at top level). |
| tests/fdb/timespan/schema2 | New test schema variant (timespan default at second level). |
| tests/fdb/timespan/schema3 | New test schema variant (timespan default at index level). |
| tests/fdb/timespan/req | New sample retrieve request for timespan combinations. |
| tests/fdb/timespan/local1.yaml | New local config referencing schema1. |
| tests/fdb/timespan/local2.yaml | New local config referencing schema2. |
| tests/fdb/timespan/local3.yaml | New local config referencing schema3. |
| tests/fdb/timespan/in.grib | New GRIB test input data for timespan tests. |
| tests/fdb/timespan/CMakeLists.txt | Registers the new timespan test with CTest/ecbuild. |
| tests/fdb/test_service.cc | Modernizes loops/assertions and adjusts parameter strings in tests. |
| tests/fdb/etc/fdb/schema | Updates the test schema content/types and adds timespan-related rules/keywords. |
| tests/fdb/daos/test_daos_catalogue.cc | Adjusts MarsRequest creation via Key::request() default verb. |
| tests/fdb/api/test_select.cc | Builds requests via MarsParser and ensures inspect requests are non-empty. |
| tests/fdb/api/test_fdb_c.cc | Adjusts metadata iteration expectations to skip empty optional metadata entries. |
| tests/fdb/api/test_dist.cc | Ensures inspect request has a verb (MarsRequest{\"retrieve\"}). |
| tests/fdb/api/test_callback.cc | Updates archived type used in callback test. |
| tests/fdb/api/test_auxiliary.cc | Updates archived type used across auxiliary/wipe tests. |
| tests/fdb/CMakeLists.txt | Adds the new timespan subdirectory to the test build. |
| src/fdb5/tools/compare/grib/CompareBitwise.cc | Removes a leftover commented debug print. |
| src/fdb5/tools/compare/fdb-compare.cc | Removes an unused member variable. |
| src/fdb5/rules/Rule.cc | Updates optional/default handling in key matching and key expansion logic. |
| src/fdb5/rules/MatchOptional.cc | Changes missing optional value lookup to return empty rather than the default. |
| src/fdb5/database/IndexAxis.cc | Skips emitting axes that only contain a single empty value (text/JSON). |
| src/fdb5/api/FDB.cc | Filters out postproc/sink parameters before inspecting (index-relevant request only). |
| src/fdb5/CMakeLists.txt | Comments out building fdb-patch under HAVE_GRIB. |
Comments suppressed due to low confidence (2)
src/fdb5/rules/Rule.cc:229
- In
findMatchingKey(const eckit::StringList& values), the assertion now allowsvalues.size()to be smaller thanpredicates_.size()(when optionals exist), but the loop still doesvalues.at(i)for every predicate. If the fingerprint/tokenizer drops trailing empty fields, this can throwstd::out_of_rangeinstead of treating missing optional values as empty/default. Consider iterating values and predicates together and, whenvaluesruns out, only accepting remaining predicates if they are optional (using an empty string/default as appropriate).
size_t numPred = predicates_.size();
for (const auto& p : predicates_) {
if (p->optional()) {
numPred--;
}
}
ASSERT(values.size() >= numPred);
TypedKey key(registry_);
for (auto iter = predicates_.begin(); iter != predicates_.end(); ++iter) {
const auto& pred = *iter;
const auto& keyword = pred->keyword();
/// @note 1-1 order between predicates and values
const auto& value = values.at(iter - predicates_.begin());
if (!pred->match(value)) {
src/fdb5/rules/Rule.cc:438
tryFill()now assertsvalues.size() >= numPred(excluding optional predicates), but the fill loop still expects all predicates to be consumed (it_pred == end). If the serialized fingerprint omits trailing empty optional fields,tryFill()will return false (andfill()will ASSERT) even though the remaining predicates are optional. Adjust the termination/consumption checks so that missing trailing optional values are accepted, and make sure the mapping between values and predicates stays aligned.
size_t numPred = predicates_.size();
for (const auto& p : predicates_) {
if (p->optional()) {
numPred--;
}
}
ASSERT(values.size() >= numPred); // Should be equal, except for quantile (FDB-103)
ASSERT(values.size() <= predicates_.size() + 1);
auto it_value = values.begin();
auto it_pred = predicates_.begin();
for (; it_pred != predicates_.end() && it_value != values.end(); ++it_pred, ++it_value) {
if (values.size() == (predicates_.size() + 1) && (*it_pred)->keyword() == "quantile") {
std::string actualQuantile = *it_value;
++it_value;
if (it_value == values.end()) {
return false;
}
actualQuantile += std::string(":") + (*it_value);
(*it_pred)->fill(key, actualQuantile);
}
else {
(*it_pred)->fill(key, *it_value);
}
}
// Check that everything is exactly consumed
if (it_value != values.end()) {
return false;
}
if (it_pred != predicates_.end()) {
return false;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # * Attributes can be typed | ||
| # Globally, at the beginning of this file: | ||
| # Globally, at the begining of this file: |
There was a problem hiding this comment.
Spelling: "begining" should be "beginning" in this comment.
| # Globally, at the begining of this file: | |
| # Globally, at the beginning of this file: |
tests/fdb/etc/fdb/schema
Outdated
| [ step, param, frequency?, direction? ]] | ||
| [ type=me, levtype=ml, timespan?none | ||
| [ step?, number-, levelist, param, coeffindex? ]] | ||
| [ type=4i, levtype=ml, tmespan? |
There was a problem hiding this comment.
This schema line uses tmespan?, which looks like a typo for timespan? (and likely intended to mirror the surrounding timespan?none usage). As written, it will introduce a separate tmespan keyword into the schema, which can break rule matching and indexing for this branch.
| [ type=4i, levtype=ml, tmespan? | |
| [ type=4i, levtype=ml, timespan?none |
tests/fdb/timespan/schema2
Outdated
| [ step, param, frequency?, direction? ]] | ||
| [ type=me, levtype=ml, timespan?none | ||
| [ step?, number-, levelist, param, coeffindex? ]] | ||
| [ type=4i, levtype=ml, tmespan? |
There was a problem hiding this comment.
This schema line uses tmespan?, which looks like a typo for timespan? (and likely intended to be timespan?none like the other branches). As written, it defines a different keyword and will prevent the timespan rules from applying as expected.
| [ type=4i, levtype=ml, tmespan? | |
| [ type=4i, levtype=ml, timespan?none |
| if ( HAVE_GRIB ) | ||
| list( APPEND fdb5_tools | ||
| fdb-hammer | ||
| fdb-patch | ||
| # fdb-patch | ||
| ) |
There was a problem hiding this comment.
fdb-patch has been commented out from the list of built tools under HAVE_GRIB. If this is not intentional, it will remove the executable from builds/installs. If it is intentional, it would help to add a short comment explaining why the tool is being disabled (and ideally gate it behind an option) to avoid an unexpected feature regression.
| int err; | ||
| v = nullptr; | ||
| // skip empty values (optional metadata) | ||
| while (v == nullptr || strlen(v) == 0) { |
There was a problem hiding this comment.
The loop that skips empty optional metadata values can become infinite if fdb_splitkey_next_metadata() returns FDB_ITERATION_COMPLETE (or any error) without setting v to a non-empty string. It also ignores err until after the loop. Consider checking err on each iteration and breaking/failing on non-FDB_SUCCESS, and/or bounding the loop so iteration-complete cannot spin forever.
| int err; | |
| v = nullptr; | |
| // skip empty values (optional metadata) | |
| while (v == nullptr || strlen(v) == 0) { | |
| int err = FDB_SUCCESS; | |
| v = nullptr; | |
| // skip empty values (optional metadata) | |
| while ((v == nullptr || strlen(v) == 0) && err == FDB_SUCCESS) { |
| # * Attributes can be optional | ||
| # [ step, levelist?, param ] | ||
| # They will be replaced internally by an empty value. It is also possible to provide a default substitution value: e.g. [domain?g] will consider the domain to be 'g' if missing. | ||
| # They will be replaced internally by an empty value. It is also posiible to provide a default subtitution value: e.g. [domain?g] will consider the domain to be 'g' if missing. |
There was a problem hiding this comment.
Spelling: "posiible" should be "possible" in this comment.
| # They will be replaced internally by an empty value. It is also posiible to provide a default subtitution value: e.g. [domain?g] will consider the domain to be 'g' if missing. | |
| # They will be replaced internally by an empty value. It is also possible to provide a default subtitution value: e.g. [domain?g] will consider the domain to be 'g' if missing. |
| # * Attributes can be optional | ||
| # [ step, levelist?, param ] | ||
| # They will be replaced internally by an empty value. It is also possible to provide a default substitution value: e.g. [domain?g] will consider the domain to be 'g' if missing. | ||
| # They will be replaced internally by an empty value. It is also posiible to provide a default subtitution value: e.g. [domain?g] will consider the domain to be 'g' if missing. |
There was a problem hiding this comment.
Spelling: "subtitution" should be "substitution" in this comment.
| # They will be replaced internally by an empty value. It is also posiible to provide a default subtitution value: e.g. [domain?g] will consider the domain to be 'g' if missing. | |
| # They will be replaced internally by an empty value. It is also possible to provide a default substitution value: e.g. [domain?g] will consider the domain to be 'g' if missing. |
| std::vector<Key> Rule::findMatchingKeys(const metkit::mars::MarsRequest& request) const { | ||
|
|
||
| RuleGraph graph; | ||
|
|
||
| for (const auto& pred : predicates_) { | ||
|
|
||
| const auto& keyword = pred->keyword(); | ||
| const auto& type = registry_.lookupType(keyword); | ||
|
|
||
|
|
||
| const auto& values = pred->values(request); | ||
| auto values = pred->values(request); | ||
|
|
||
| /// @note do we want to allow empty values? | ||
| // if (values.empty() && pred->optional()) { values.push_back(pred->defaultValue()); } | ||
| /// if the request does not have the keyword, but the predicate is optional, then use the default value | ||
| if (values.empty() && pred->optional()) { | ||
| values.push_back(pred->defaultValue()); | ||
| } | ||
|
|
||
| auto& node = graph.push(keyword); | ||
|
|
||
| for (const auto& value : values) { | ||
| if (pred->match(value)) { | ||
| node.emplace_back(value); | ||
| } | ||
| } | ||
|
|
||
| /// if there are no matching values for this predicate, then there are no matching keys for the rule | ||
| if (node.empty()) { | ||
| return {}; | ||
| } |
There was a problem hiding this comment.
findMatchingKeys(const MarsRequest&) uses pred->values(request) directly. For optional-with-default predicates, MatchOptional::values() returns the default when the keyword is missing, so this path never considers the empty-string alternative. That can prevent tools that rely on Schema::expandDatabase() (e.g. fdb-root) from matching databases indexed with an omitted optional (stored as empty) when the request omits the keyword. Align this implementation with the ReadVisitor overload: explicitly detect missing keywords and add both defaultValue() and "" (when appropriate) to the candidate values.
…e index
Description
Contributor Declaration
By opening this pull request, I affirm the following:
🌈🌦️📖🚧 Documentation FDB 🚧📖🌦️🌈
https://sites.ecmwf.int/docs/dev-section/fdb/pull-requests/PR-246