Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite continuations: sum(*) returns alternating rows forever #3096

Closed
ohadzeliger opened this issue Feb 4, 2025 · 3 comments · Fixed by #3211
Closed

Infinite continuations: sum(*) returns alternating rows forever #3096

ohadzeliger opened this issue Feb 4, 2025 · 3 comments · Fixed by #3211
Assignees
Labels
bug Something isn't working test failure A test is failing at least some of the time

Comments

@ohadzeliger
Copy link
Contributor

Adding maxRows:1 to the yaml test keeps returning two rows forever (the result row and an empty row), the continuation is never atEnd().

Steps To Reproduce:

  1. For example, in the test aggregate-index-tests.yamsql, modify the test to be:
  2.    - query: select max(col2) from T1 use index (mv8);
    - maxRows: 1
    - explain: "ISCAN(MV8 <,>) | MAP (_ AS _0) | AGG (max_l(_._0.COL2) AS _0) | ON EMPTY NULL | MAP (_._0._0 AS _0)"
    - result: [{!l 13}]
    - result: [{!null _}]
    - result: [{!l 13}]
    - result: [{!null _}]
    - result: [{!l 13}]
    - result: [{!null _}]
    - result: [{!l 13}]
    - result: [{!null _}]
    
  3. And the test will always fail with “Query returned more results, but no more were expected after line …”

Note that this behaves differently when running in mixed mode, where the error is:

Caused by: com.google.common.base.VerifyException
at com.google.common.base.Verify.verify(Verify.java:102)
at com.apple.foundationdb.relational.recordlayer.query.QueryPlan$ContinuedPhysicalQueryPlan.validatePlanAgainstEnvironment(QueryPlan.java:520)
at com.apple.foundationdb.relational.recordlayer.query.QueryPlan$PhysicalQueryPlan.executePhysicalPlan(QueryPlan.java:361)
at com.apple.foundationdb.relational.recordlayer.query.QueryPlan$PhysicalQueryPlan.executeInternal(QueryPlan.java:252)
at com.apple.foundationdb.relational.recordlayer.query.QueryPlan$PhysicalQueryPlan.executeInternal(QueryPlan.java:112)

@alecgrieser alecgrieser added bug Something isn't working test failure A test is failing at least some of the time labels Feb 28, 2025
@alecgrieser
Copy link
Collaborator

alecgrieser commented Feb 28, 2025

We now know what the difference between mixed-mode and non-mixed results here is about (it's a testing artifact based on how the framework handles receiving null/starting continuations--it's also been addressed by #3186). The continuation problem has to do with how the aggregate cursor handles continuations.

Namely, this is a consequence of the fact that the AggregateCursor does not remember whether it has already returned a value, so when maxRows is 1, it effectively goes:

  1. First iteration: start from the beginning, return value and continuation pointing to the last record in the scan
  2. Second iteration: pick up from where the previous continuation left off, scanning zero values, so return the default null value with a START continuation (as it always does when it returns the default value)
  3. Third iteration: now we have a start continuation, so go back to step 1

Then this repeats ad infinitum. This is something @pengpeng-lu noted in #3177 (for #3180). It's also notable that I think this should go away when #3107 can be done in 4.2, as this doesn't happen when isCreateDefaultOnEmpty is set to false.

@alecgrieser
Copy link
Collaborator

Also notable: this happens for all aggregates, not just sum

@alecgrieser
Copy link
Collaborator

Actually, my comment about:

It's also notable that I think this should go away when #3107 can be done in 4.2, as this doesn't happen when isCreateDefaultOnEmpty is set to false.

Led me to another discover: isCreateDefaultOnEmpty is a field that is already being set by default on the AggregateCursor on 4.1, but we're still seeing the error. As it turns out, we forgot to serialize the isCreateDefaultOnEmpty configuration, so when the value goes back to the older value, it then defaults to the older behavior. If we serialize that, then we can at least have the continuations be correct going backwards.

@alecgrieser alecgrieser self-assigned this Feb 28, 2025
alecgrieser added a commit to alecgrieser/fdb-record-layer that referenced this issue Feb 28, 2025
…te plan continuations

There is missing state that was introduced to the `AggregateCursor` and its accompanying plan in FoundationDB#3092. However, this state was not added to the continuation, which results in unexpected behavior when we try to deserialize the plan's continuation (effectively, it reverts to the old behavior of the plan operator). We were already correctly _de_serializing this information, so the upshot is that any plan that is started after this current version should exhibit the correct behavior, but we get incorrect behavior when a plan is started on an older version.

This fixes FoundationDB#3096.
alecgrieser added a commit that referenced this issue Feb 28, 2025
…te plan continuations (#3211)

There is missing state that was introduced to the `AggregateCursor` and
its accompanying plan in #3092. However, this state was not added to the
continuation, which results in unexpected behavior when we try to
deserialize the plan's continuation (effectively, it reverts to the old
behavior of the plan operator). We were already correctly
_de_serializing this information, so the upshot is that any plan that is
started after this current version should exhibit the correct behavior,
but we get incorrect behavior when a plan is started on an older
version.

This fixes #3096. I was able to get test coverage of this part of the
code by enabling `FORCE_CONTINUATIONS` mode on `aggregateIndexTests`.
alecgrieser added a commit to alecgrieser/fdb-record-layer that referenced this issue Feb 28, 2025
alecgrieser added a commit to alecgrieser/fdb-record-layer that referenced this issue Mar 13, 2025
…tion deserialization

This cleans up the `StreamingAggregateCursor` and its helper classes to remove the `createDefaultIfEmpty` option. We'd been carrying that option around as we introduced that field in FoundationDB#3092 and wanted to preserve behavior when upgrading from older versions. The intention had been that all new plans would set the option starting in 4.1, but a bug (fixed in FoundationDB#3211; see FoundationDB#3096) means that we didn't actually enable it until 4.1.9.0. That means that after this change, we'd require 4.1.9.0 or newer in order to safely continue these queries. In theory, we could wait with this change, but 4.1.9.0 has enough fixes that I actually think our recommendation should be that anyone upgrading from 4.0 or below go straight to 4.1.9.0, and then they can proceed safely to a newer version with this change.

I was able to validate that this change is compatible with 4.1.9.0 via the cross-version tests run during PRB. When I ran the full mixed mode tests with the `aggregate-index-tests.yamsql`, I found that only 4.1.9.0 worked, which is expected.

This resolves FoundationDB#3092.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test failure A test is failing at least some of the time
Projects
None yet
2 participants