fix(dataset): ensure that fields required for the lookups are included #2212

abdimo101 · 2025-09-16T11:31:18Z

Description

Fixes dataset V4 findOne endpoint to preserve fields required for relationship lookups.

Motivation

When users request specific fields (e.g. only datasetName) while including relationships like attachments in their query, the system was filtering out required fields (like pid) needed to establish these relationships.

This resulted in empty relationship arrays being returned even when relationships existed in the database.

Fixes

Bug fixed (#X)

Changes:

changes made

Tests included

Included for each change/fix?
Passing?

Documentation

swagger documentation updated (required for API changes)
official documentation updated

official documentation info

emigun · 2025-09-16T14:45:06Z

Looks good to me! (but I feel like someone with more knowledge on MongoDB lookups should be approving)

Junjiequan

I think it'd be better if the solution were more generic so that other controllers can use it, but I think it works, and maybe we can refactor it later

src/datasets/datasets.service.ts

HayenNico · 2025-09-17T09:57:14Z

To me it looks like this issue comes from a messed up order of operations in the mongo pipeline in findOneComplete in datasets.service. What needs to change is that the fieldsProjection comes after the lookup in the pipeline.
If you compare the current findOneComplete to findAllComplete, the latter uses a different order that should always work

I don't think we need to (or should) introduce this exception for required fields

Junjiequan · 2025-09-17T12:15:38Z

To me it looks like this issue comes from a messed up order of operations in the mongo pipeline in findOneComplete in datasets.service. What needs to change is that the fieldsProjection comes after the lookup in the pipeline.
If you compare the current findOneComplete to findAllComplete, the latter uses a different order that should always work

@HayenNico Yes, if we change the order of fieldsProjection this will solve the issue, as long as whoever calls this endpoint knows that in order to get the lookup data they must provide the lookup document name in the $project.

For example, after changing the order like this:

const pipeline: PipelineStage[] = [{ $match: whereFilter }];

this.addLookupFields(pipeline, filter.include);

if (!isEmpty(fieldsProjection)) {
  const projection = parsePipelineProjection(fieldsProjection);
  pipeline.push({ $project: projection });
}

if (!isEmpty(limits.sort)) {
  const sort = parsePipelineSort(limits.sort);
  pipeline.push({ $sort: sort });
}

The following query returns the expected results:

{
  "where": {
    "datasetName": {
      "$regex": "test22222233333334",
      "$options": "i"
    }
  },
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName",
    "attachments"
  ]
}

But this one do not return attachments:

{
  "where": {
    "datasetName": {
      "$regex": "test22222233333334",
      "$options": "i"
    }
  },
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName"
  ]
}

It’s because lookup fields/embedded documents require the embedded field to be in the $project, otherwise it only returns the root doc.

It’s kinda against the first instinct of how it supposed to work. For example, the general use case of $project is that if you don’t provide a projection, Mongo returns the entire doc. So, the idea behind this change is that we keep $project after $lookup because we know what lookup fields list is provided, what corresponding IDs are required, and that the required ids field will not be changed. This way people who don’t know Mongo query logic won’t get confused.

Junjiequan · 2025-09-17T12:24:26Z

That being said, I agree with your point. As a maintainer, changing the order is the simplest and cleanest solution.

HayenNico · 2025-09-17T12:42:35Z

@Junjiequan Thanks for the explanation. Assuming we put the lookup pipeline step in the correct place, what we could add is that any filter under "include" is automatically put into the "fields" filter as well. I think this implicit assumption that a client who specified "include" would also want to see that in the projection is valid.

HayenNico · 2025-09-17T12:51:41Z

Side note: This will require some adjustments to the tests as well. Right now 2500-0207 on findAll and 2500-0308 on findOne for using all query filters have exactly opposite logic with regard to the include fields, which is why the problem with empty includes wasn't picked up in testing.

Junjiequan · 2025-09-17T17:23:27Z

@Junjiequan Thanks for the explanation. Assuming we put the lookup pipeline step in the correct place, what we could add is that any filter under "include" is automatically put into the "fields" filter as well. I think this implicit assumption that a client who specified "include" would also want to see that in the projection is valid.

That sounds like a better solution. Just one additional thing needs to be clarified, if user provide the query below, there will be an error

  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName",
    "attachments.thumbnail"
  ]

Because You cannot specify both an embedded document and a field within that embedded document in the same projection. - doc, so if a user provided a field with embedded document then the document name should not be automatically put into the fields

@abdimo101 How do you think about this suggestion?

nitrosx · 2025-09-17T18:57:59Z

My two cents,

we should do lookup before projection.
if somebody provides the following input:

{
  "where": {
    "datasetName": {
      "$regex": "test22222233333334",
      "$options": "i"
    }
  },
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName",
    "attachments"
  ]
}

or

{
  "where": {
    "datasetName": {
      "$regex": "test22222233333334",
      "$options": "i"
    }
  },
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName",
    "attachments.thumbnail"
  ]
}

we know that what to do.

if a user provides the following input:

{
  "where": {
    "datasetName": {
      "$regex": "test22222233333334",
      "$options": "i"
    }
  },
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName"
  ]
}

we can infer that they have forgot to ask for the attachments and internally we should insert attachments in the projection as the input passed was like:

{
  "where": {
    "datasetName": {
      "$regex": "test22222233333334",
      "$options": "i"
    }
  },
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName",
    "attachments"
  ]
}

f course everything should be documented in the swagger endpoint, including the reference to the path collision mentioned by @Junjiequan

abdimo101 · 2025-09-18T08:31:23Z

@Junjiequan Thanks for the explanation. Assuming we put the lookup pipeline step in the correct place, what we could add is that any filter under "include" is automatically put into the "fields" filter as well. I think this implicit assumption that a client who specified "include" would also want to see that in the projection is valid.

That sounds like a better solution. Just one additional thing needs to be clarified, if user provide the query below, there will be an error
  "include": [
    "attachments"
  ],
  "fields": [
    "datasetName",
    "attachments.thumbnail"
  ]
Because You cannot specify both an embedded document and a field within that embedded document in the same projection. - doc, so if a user provided a field with embedded document then the document name should not be automatically put into the fields

@abdimo101 How do you think about this suggestion?

Yes to avoid the path collision, we have to make a check to ensure that when a field within an embedded document(e.g attachments.thumbnail) has been added, then the embedded document should automatically not be added as well if it's under include.

…//github.com/SciCatProject/scicat-backend-next into fix-preserve-lookup-fields-for-relationships

HayenNico

Looks good

minottic

sorry, I overlooked this. I think now that the "scope" is supported in the dataset endpoint, a user would filter and project using that (see this test), so the relation fields in the parent I believe should not be supported (also because scope enables more than that, e.g. filters and limits on relations). I think all findOneComplete should simply reuse the findAllComplete, see here. My solution started with datasest, I agree this concept should later be applied everywhere. I simpatise defaulting to include relations when in include but not in field, but then I think this is the only thing this PR should do, on top of the scopes and reusing findAllComplete

test/DatasetV4.js

src/attachments/attachments.service.ts

abdimo101 · 2025-09-29T14:18:28Z

Hi @minottic, I had a discussion with @nitrosx & @Junjiequan about your changes. We concluded that the fields inside scope are not really necessary, as they only apply to fields in lookup documents and not the parent (i.e dataset). The only way to filter parent fields like datasetName is to use the fields array outside the scope, which means the users would need to specify in two places.

This becomes even more challenging when multiple relations are involved, as users would have to update several different fields arrays. For example:

{
  "where": {
    "datasetName": {
      "$regex": "Dataset",
      "$options": "i"
    }
  },
  "include": [
    "origdatablock",
    {
      "relation": "attachments",
      "scope": {
        "fields": [
          "filename",
          "mimetype"
        ],
        "limits": {
          "limit": 5,
          "skip": 0,
          "sort": {
            "filename": "asc"
          }
        },
        "where": {
          "filename": {
            "$regex": "data",
            "$options": "i"
          }
        }
      }
    },
    {
      "relation": "proposals",
      "scope": {
        "fields": [
          "title",
          "abstract"
        ],
        "limits": {
          "limit": 1,
          "skip": 0,
          "sort": {
            "title": "asc"
          }
        },
        "where": {
          "title": {
            "$regex": "ESS",
            "$options": "i"
          }
        }
      }
    }
  ],
  "fields": [
    "datasetName"
  ],
  "limits": {
    "limit": 10,
    "skip": 0,
    "sort": {
      "datasetName": "asc | desc"
    }
  }
}

This approach requires users to handle multiple fields arrays, we believe it would be simpler for users to only handle one.

FIrst improvement:

{
  "where": {
    "datasetName": {
      "$regex": "Dataset",
      "$options": "i"
    }
  },
  "include": [
    "origdatablock",
    {
      "relation": "attachments",
      "scope": {
        "limits": {
          "limit": 5,
          "skip": 0,
          "sort": {
            "filename": "asc"
          }
        },
        "where": {
          "filename": {
            "$regex": "data",
            "$options": "i"
          }
        }
      }
    },
    {
      "relation": "proposals",
      "scope": {
        "limits": {
          "limit": 1,
          "skip": 0,
          "sort": {
            "title": "asc"
          }
        },
        "where": {
          "title": {
            "$regex": "ESS",
            "$options": "i"
          }
        }
      }
    }
  ],
  "fields": [
    "datasetName"
    "attachments.filename",
    "attachments.mimetype",
    "proposals.title",
    "proposals.abstract"
 ],
  "limits": {
    "limit": 10,
    "skip": 0,
    "sort": {
      "datasetName": "asc | desc"
    }
  }
}

And we also talked about implementing this format not in this PR, but in the future :

{
  "where": {
    "datasetName": {
      "$regex": "Dataset",
      "$options": "i"
    }
  },
  "include": [
    "origdatablock",
    {
      "collection": "attachments",
      "limits": {
        "limit": 5,
        "skip": 0,
        "sort": {
          "filename": "asc"
        }
      },
      "where": {
        "filename": {
          "$regex": "data",
          "$options": "i"
        }
      }
    },
    {
      "collection": "proposals",
      "limits": {
        "limit": 1,
        "skip": 0,
        "sort": {
          "title": "asc"
        }
      },
      "where": {
        "title": {
          "$regex": "ESS",
          "$options": "i"
        }
      }
    }
  ],
  "fields": [
    "datasetName"
    "attachments.filename",
    "attachments.mimetype",
    "proposals.title",
    "proposals.abstract"
 ],
  "limits": {
    "limit": 10,
    "skip": 0,
    "sort": {
      "datasetName": "asc | desc"
    }
  }
}

minottic · 2025-09-29T14:35:13Z

@abdimo101 fields will be both in scope and in the upper level in my (merged) PR, so the user will need a more complex json, but this will be way closer to the V3 implementation (and backward compatible with the old backend). But I simpatise with having the ability to add fields in the uppermost level as well, and I have to take back that it should not be supported. I think it would be a bit easier if we split the features.

we support already scopes with where and fields, as loopback 3 did and as v3 should do (thus the easy extension to v4). See PR feat: enable scoped search on relations #2231 . This includes adding where, limits and fields on relations for datasets
we need to implicitely add relations to fields when used in the include (and I thikn this is the only thing this PR should do)
we should avoid code duplication, and this is what chore: refactor findOneComplete to avoid dup #2236 does
we need to extend parent fields and use parent.children in fields. This could either be here or in another PR (I think it's better in another PR, but I am also fine having it here)

btw, this

And we also talked about implementing this format not in this PR, but in the future :

is already supported by this PR for datasets #2231. But if you mean here just making the filter syntax a bit easiear, that could work even though I am not sure it's worth asking users to change all their syntax when migrating from v3 to v4 for such a cosmetic change (you are essentially "stripping" the "scope" and renaming "relation" to "collection". For example, see how simple it becomes to extend to v3 here #2237 ). I agree it should be extended to other collections, but as I commented in this PR #2236 I thikn a more complete change is to refactor all find in all controllers to use a shared findAllComplete and findOneComplete that will reuse the pipelines.

I will be happy to discuss this in a meeting next week

abdimo101 · 2025-09-29T15:04:49Z

Hi @minottic, thanks for the quick response. I pushed my last commit before seeing your response, but I’m open to discussing this further next week.

nitrosx · 2025-09-29T15:17:14Z

@minottic I understand that you would like to maintain the same filter syntax in v4.
I would need more details to understand the context for this decision.
That said, I see dataset v4 as the right chance to move away completely from loopback syntax, which is not intuitive, complex and does not allow complex queries.

…ling for path collision and a warning in swagger

minottic

sorry, I missed the refactor of the PR. Looks good to me, thanks for the change!

abdimo101 added 3 commits September 16, 2025 12:48

ensures fields required for the lookups are included

6475b3a

changed comments and eslint fix

6aa1deb

API test added

497573e

abdimo101 requested a review from Junjiequan September 16, 2025 11:36

Junjiequan approved these changes Sep 17, 2025

View reviewed changes

src/datasets/datasets.service.ts Outdated Show resolved Hide resolved

abdimo101 added 3 commits September 17, 2025 19:46

ensures fields required for the lookups are included

9c5f112

changed comments and eslint fix

3d7a81b

API test added

48af15a

Junjiequan force-pushed the fix-preserve-lookup-fields-for-relationships branch from 497573e to 48af15a Compare September 17, 2025 17:47

abdimo101 added 8 commits September 18, 2025 15:43

modified parsePipelineProjection to add include field to the projection

646ac51

Merge branch 'fix-preserve-lookup-fields-for-relationships' of https:…

6e101b1

…//github.com/SciCatProject/scicat-backend-next into fix-preserve-lookup-fields-for-relationships

fixed failing tests

c779675

changes to fix failing tests + added a new api test

4a1e565

Merge branch 'master' into fix-preserve-lookup-fields-for-relationships

39790ff

eslint fix

404c4a9

removed .only from api test

da48ded

fixed api test

cf6c02b

HayenNico approved these changes Sep 22, 2025

View reviewed changes

Merge branch 'master' into fix-preserve-lookup-fields-for-relationships

1216b51

HayenNico mentioned this pull request Sep 26, 2025

chore: refactor findOneComplete to avoid dup #2236

Open

4 tasks

minottic reviewed Sep 26, 2025

View reviewed changes

test/DatasetV4.js Show resolved Hide resolved

src/attachments/attachments.service.ts Outdated Show resolved Hide resolved

Merge branch 'master' into fix-preserve-lookup-fields-for-relationships

427685f

abdimo101 requested a review from a team as a code owner September 29, 2025 11:25

removed fields inside scope and added some requested changes

0388494

abdimo101 added 4 commits October 3, 2025 17:09

reverted back some changes to only fix the bug, also added error hand…

0d262ab

…ling for path collision and a warning in swagger

eslint fix

34ee21b

fixed failing api tests

9344d17

Merge branch 'master' into fix-preserve-lookup-fields-for-relationships

ff897ba

minottic approved these changes Oct 14, 2025

View reviewed changes

Merge branch 'master' into fix-preserve-lookup-fields-for-relationships

0b1bde8

abdimo101 merged commit 6d27903 into master Oct 15, 2025
13 checks passed

abdimo101 deleted the fix-preserve-lookup-fields-for-relationships branch October 15, 2025 07:42

fix(dataset): ensure that fields required for the lookups are included #2212

fix(dataset): ensure that fields required for the lookups are included #2212

Uh oh!

Conversation

abdimo101 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation

Fixes

Changes:

Tests included

Documentation

official documentation info

Uh oh!

emigun commented Sep 16, 2025

Uh oh!

Junjiequan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HayenNico commented Sep 17, 2025

Uh oh!

Junjiequan commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Junjiequan commented Sep 17, 2025

Uh oh!

HayenNico commented Sep 17, 2025

Uh oh!

HayenNico commented Sep 17, 2025

Uh oh!

Junjiequan commented Sep 17, 2025 • edited by abdimo101 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nitrosx commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abdimo101 commented Sep 18, 2025

Uh oh!

HayenNico left a comment

Choose a reason for hiding this comment

Uh oh!

minottic left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

abdimo101 commented Sep 29, 2025

Uh oh!

minottic commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abdimo101 commented Sep 29, 2025

Uh oh!

nitrosx commented Sep 29, 2025

Uh oh!

minottic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

abdimo101 commented Sep 16, 2025 •

edited

Loading

Junjiequan commented Sep 17, 2025 •

edited

Loading

Junjiequan commented Sep 17, 2025 •

edited by abdimo101

Loading

nitrosx commented Sep 17, 2025 •

edited

Loading

minottic left a comment •

edited

Loading

minottic commented Sep 29, 2025 •

edited

Loading