Skip to content

Conversation

abdimo101
Copy link
Member

@abdimo101 abdimo101 commented Sep 16, 2025

Description

Fixes dataset V4 findOne endpoint to preserve fields required for relationship lookups.

Motivation

When users request specific fields (e.g. only datasetName) while including relationships like attachments in their query, the system was filtering out required fields (like pid) needed to establish these relationships.

This resulted in empty relationship arrays being returned even when relationships existed in the database.

Fixes

  • Bug fixed (#X)

Changes:

  • changes made

Tests included

  • Included for each change/fix?
  • Passing?

Documentation

  • swagger documentation updated (required for API changes)
  • official documentation updated

official documentation info

@emigun
Copy link
Member

emigun commented Sep 16, 2025

Looks good to me! (but I feel like someone with more knowledge on MongoDB lookups should be approving)

Copy link
Member

@Junjiequan Junjiequan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better if the solution were more generic so that other controllers can use it, but I think it works, and maybe we can refactor it later

@HayenNico
Copy link
Member

To me it looks like this issue comes from a messed up order of operations in the mongo pipeline in findOneComplete in datasets.service. What needs to change is that the fieldsProjection comes after the lookup in the pipeline.
If you compare the current findOneComplete to findAllComplete, the latter uses a different order that should always work

I don't think we need to (or should) introduce this exception for required fields

@Junjiequan
Copy link
Member

Junjiequan commented Sep 17, 2025

To me it looks like this issue comes from a messed up order of operations in the mongo pipeline in findOneComplete in datasets.service. What needs to change is that the fieldsProjection comes after the lookup in the pipeline.
If you compare the current findOneComplete to findAllComplete, the latter uses a different order that should always work

@HayenNico Yes, if we change the order of fieldsProjection this will solve the issue, as long as whoever calls this endpoint knows that in order to get the lookup data they must provide the lookup document name in the $project.

For example, after changing the order like this:

const pipeline: PipelineStage[] = [{ $match: whereFilter }];

this.addLookupFields(pipeline, filter.include);

if (!isEmpty(fieldsProjection)) {
  const projection = parsePipelineProjection(fieldsProjection);
  pipeline.push({ $project: projection });
}

if (!isEmpty(limits.sort)) {
  const sort = parsePipelineSort(limits.sort);
  pipeline.push({ $sort: sort });
}

The following query returns the expected results:

{
  "where": {
    "datasetName": {
      "$regex": "test22222233333334",
      "$options": "i"
    }
  },
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName",
    "attachments"
  ]
}

But this one do not return attachments:

{
  "where": {
    "datasetName": {
      "$regex": "test22222233333334",
      "$options": "i"
    }
  },
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName"
  ]
}

It’s because lookup fields/embedded documents require the embedded field to be in the $project, otherwise it only returns the root doc.

It’s kinda against the first instinct of how it supposed to work. For example, the general use case of $project is that if you don’t provide a projection, Mongo returns the entire doc. So, the idea behind this change is that we keep $project after $lookup because we know what lookup fields list is provided, what corresponding IDs are required, and that the required ids field will not be changed. This way people who don’t know Mongo query logic won’t get confused.

@Junjiequan
Copy link
Member

That being said, I agree with your point. As a maintainer, changing the order is the simplest and cleanest solution.

@HayenNico
Copy link
Member

@Junjiequan Thanks for the explanation. Assuming we put the lookup pipeline step in the correct place, what we could add is that any filter under "include" is automatically put into the "fields" filter as well. I think this implicit assumption that a client who specified "include" would also want to see that in the projection is valid.

@HayenNico
Copy link
Member

Side note: This will require some adjustments to the tests as well. Right now 2500-0207 on findAll and 2500-0308 on findOne for using all query filters have exactly opposite logic with regard to the include fields, which is why the problem with empty includes wasn't picked up in testing.

@Junjiequan
Copy link
Member

Junjiequan commented Sep 17, 2025

@Junjiequan Thanks for the explanation. Assuming we put the lookup pipeline step in the correct place, what we could add is that any filter under "include" is automatically put into the "fields" filter as well. I think this implicit assumption that a client who specified "include" would also want to see that in the projection is valid.

That sounds like a better solution. Just one additional thing needs to be clarified, if user provide the query below, there will be an error

  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName",
    "attachments.thumbnail"
  ]

Because You cannot specify both an embedded document and a field within that embedded document in the same projection. - doc, so if a user provided a field with embedded document then the document name should not be automatically put into the fields

@abdimo101 How do you think about this suggestion?

@Junjiequan Junjiequan force-pushed the fix-preserve-lookup-fields-for-relationships branch from 497573e to 48af15a Compare September 17, 2025 17:47
@nitrosx
Copy link
Member

nitrosx commented Sep 17, 2025

My two cents,

  • we should do lookup before projection.
  • if somebody provides the following input:
{
  "where": {
    "datasetName": {
      "$regex": "test22222233333334",
      "$options": "i"
    }
  },
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName",
    "attachments"
  ]
}

or

{
  "where": {
    "datasetName": {
      "$regex": "test22222233333334",
      "$options": "i"
    }
  },
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName",
    "attachments.thumbnail"
  ]
}

we know that what to do.

  • if a user provides the following input:
{
  "where": {
    "datasetName": {
      "$regex": "test22222233333334",
      "$options": "i"
    }
  },
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName"
  ]
}

we can infer that they have forgot to ask for the attachments and internally we should insert attachments in the projection as the input passed was like:

{
  "where": {
    "datasetName": {
      "$regex": "test22222233333334",
      "$options": "i"
    }
  },
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName",
    "attachments"
  ]
}

f course everything should be documented in the swagger endpoint, including the reference to the path collision mentioned by @Junjiequan

@abdimo101
Copy link
Member Author

@Junjiequan Thanks for the explanation. Assuming we put the lookup pipeline step in the correct place, what we could add is that any filter under "include" is automatically put into the "fields" filter as well. I think this implicit assumption that a client who specified "include" would also want to see that in the projection is valid.

That sounds like a better solution. Just one additional thing needs to be clarified, if user provide the query below, there will be an error

  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName",
    "attachments.thumbnail"
  ]

Because You cannot specify both an embedded document and a field within that embedded document in the same projection. - doc, so if a user provided a field with embedded document then the document name should not be automatically put into the fields

@abdimo101 How do you think about this suggestion?

Yes to avoid the path collision, we have to make a check to ensure that when a field within an embedded document(e.g attachments.thumbnail) has been added, then the embedded document should automatically not be added as well if it's under include.

Copy link
Member

@HayenNico HayenNico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link
Member

@minottic minottic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I overlooked this. I think now that the "scope" is supported in the dataset endpoint, a user would filter and project using that (see this test), so the relation fields in the parent I believe should not be supported (also because scope enables more than that, e.g. filters and limits on relations). I think all findOneComplete should simply reuse the findAllComplete, see here. My solution started with datasest, I agree this concept should later be applied everywhere. I simpatise defaulting to include relations when in include but not in field, but then I think this is the only thing this PR should do, on top of the scopes and reusing findAllComplete

@abdimo101 abdimo101 requested a review from a team as a code owner September 29, 2025 11:25
@abdimo101
Copy link
Member Author

Hi @minottic, I had a discussion with @nitrosx & @Junjiequan about your changes. We concluded that the fields inside scope are not really necessary, as they only apply to fields in lookup documents and not the parent (i.e dataset). The only way to filter parent fields like datasetName is to use the fields array outside the scope, which means the users would need to specify in two places.

This becomes even more challenging when multiple relations are involved, as users would have to update several different fields arrays. For example:

{
  "where": {
    "datasetName": {
      "$regex": "Dataset",
      "$options": "i"
    }
  },
  "include": [
    "origdatablock",
    {
      "relation": "attachments",
      "scope": {
        "fields": [
          "filename",
          "mimetype"
        ],
        "limits": {
          "limit": 5,
          "skip": 0,
          "sort": {
            "filename": "asc"
          }
        },
        "where": {
          "filename": {
            "$regex": "data",
            "$options": "i"
          }
        }
      }
    },
    {
      "relation": "proposals",
      "scope": {
        "fields": [
          "title",
          "abstract"
        ],
        "limits": {
          "limit": 1,
          "skip": 0,
          "sort": {
            "title": "asc"
          }
        },
        "where": {
          "title": {
            "$regex": "ESS",
            "$options": "i"
          }
        }
      }
    }
  ],
  "fields": [
    "datasetName"
  ],
  "limits": {
    "limit": 10,
    "skip": 0,
    "sort": {
      "datasetName": "asc | desc"
    }
  }
} 

This approach requires users to handle multiple fields arrays, we believe it would be simpler for users to only handle one.

FIrst improvement:

{
  "where": {
    "datasetName": {
      "$regex": "Dataset",
      "$options": "i"
    }
  },
  "include": [
    "origdatablock",
    {
      "relation": "attachments",
      "scope": {
        "limits": {
          "limit": 5,
          "skip": 0,
          "sort": {
            "filename": "asc"
          }
        },
        "where": {
          "filename": {
            "$regex": "data",
            "$options": "i"
          }
        }
      }
    },
    {
      "relation": "proposals",
      "scope": {
        "limits": {
          "limit": 1,
          "skip": 0,
          "sort": {
            "title": "asc"
          }
        },
        "where": {
          "title": {
            "$regex": "ESS",
            "$options": "i"
          }
        }
      }
    }
  ],
  "fields": [
    "datasetName"
    "attachments.filename",
    "attachments.mimetype",
    "proposals.title",
    "proposals.abstract"
 ],
  "limits": {
    "limit": 10,
    "skip": 0,
    "sort": {
      "datasetName": "asc | desc"
    }
  }
}

And we also talked about implementing this format not in this PR, but in the future :

{
  "where": {
    "datasetName": {
      "$regex": "Dataset",
      "$options": "i"
    }
  },
  "include": [
    "origdatablock",
    {
      "collection": "attachments",
      "limits": {
        "limit": 5,
        "skip": 0,
        "sort": {
          "filename": "asc"
        }
      },
      "where": {
        "filename": {
          "$regex": "data",
          "$options": "i"
        }
      }
    },
    {
      "collection": "proposals",
      "limits": {
        "limit": 1,
        "skip": 0,
        "sort": {
          "title": "asc"
        }
      },
      "where": {
        "title": {
          "$regex": "ESS",
          "$options": "i"
        }
      }
    }
  ],
  "fields": [
    "datasetName"
    "attachments.filename",
    "attachments.mimetype",
    "proposals.title",
    "proposals.abstract"
 ],
  "limits": {
    "limit": 10,
    "skip": 0,
    "sort": {
      "datasetName": "asc | desc"
    }
  }
}

@minottic
Copy link
Member

minottic commented Sep 29, 2025

@abdimo101 fields will be both in scope and in the upper level in my (merged) PR, so the user will need a more complex json, but this will be way closer to the V3 implementation (and backward compatible with the old backend). But I simpatise with having the ability to add fields in the uppermost level as well, and I have to take back that it should not be supported. I think it would be a bit easier if we split the features.

  1. we support already scopes with where and fields, as loopback 3 did and as v3 should do (thus the easy extension to v4). See PR feat: enable scoped search on relations #2231 . This includes adding where, limits and fields on relations for datasets
  2. we need to implicitely add relations to fields when used in the include (and I thikn this is the only thing this PR should do)
  3. we should avoid code duplication, and this is what chore: refactor findOneComplete to avoid dup #2236 does
  4. we need to extend parent fields and use parent.children in fields. This could either be here or in another PR (I think it's better in another PR, but I am also fine having it here)

btw, this

And we also talked about implementing this format not in this PR, but in the future :

is already supported by this PR for datasets #2231. But if you mean here just making the filter syntax a bit easiear, that could work even though I am not sure it's worth asking users to change all their syntax when migrating from v3 to v4 for such a cosmetic change (you are essentially "stripping" the "scope" and renaming "relation" to "collection". For example, see how simple it becomes to extend to v3 here #2237 ). I agree it should be extended to other collections, but as I commented in this PR #2236 I thikn a more complete change is to refactor all find in all controllers to use a shared findAllComplete and findOneComplete that will reuse the pipelines.

I will be happy to discuss this in a meeting next week

@abdimo101
Copy link
Member Author

Hi @minottic, thanks for the quick response. I pushed my last commit before seeing your response, but I’m open to discussing this further next week.

@nitrosx
Copy link
Member

nitrosx commented Sep 29, 2025

@minottic I understand that you would like to maintain the same filter syntax in v4.
I would need more details to understand the context for this decision.
That said, I see dataset v4 as the right chance to move away completely from loopback syntax, which is not intuitive, complex and does not allow complex queries.

Copy link
Member

@minottic minottic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I missed the refactor of the PR. Looks good to me, thanks for the change!

@abdimo101 abdimo101 merged commit 6d27903 into master Oct 15, 2025
13 checks passed
@abdimo101 abdimo101 deleted the fix-preserve-lookup-fields-for-relationships branch October 15, 2025 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants