-
Notifications
You must be signed in to change notification settings - Fork 32
fix(dataset): ensure that fields required for the lookups are included #2212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(dataset): ensure that fields required for the lookups are included #2212
Conversation
Looks good to me! (but I feel like someone with more knowledge on MongoDB lookups should be approving) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be better if the solution were more generic so that other controllers can use it, but I think it works, and maybe we can refactor it later
To me it looks like this issue comes from a messed up order of operations in the mongo pipeline in findOneComplete in datasets.service. What needs to change is that the fieldsProjection comes after the lookup in the pipeline. I don't think we need to (or should) introduce this exception for required fields |
@HayenNico Yes, if we change the order of fieldsProjection this will solve the issue, as long as whoever calls this endpoint knows that in order to get the lookup data they must provide the lookup document name in the $project. For example, after changing the order like this: const pipeline: PipelineStage[] = [{ $match: whereFilter }];
this.addLookupFields(pipeline, filter.include);
if (!isEmpty(fieldsProjection)) {
const projection = parsePipelineProjection(fieldsProjection);
pipeline.push({ $project: projection });
}
if (!isEmpty(limits.sort)) {
const sort = parsePipelineSort(limits.sort);
pipeline.push({ $sort: sort });
} The following query returns the expected results: {
"where": {
"datasetName": {
"$regex": "test22222233333334",
"$options": "i"
}
},
"include": [
"attachments"
],
"fields": [
"datasetName",
"attachments"
]
} But this one do not return attachments: {
"where": {
"datasetName": {
"$regex": "test22222233333334",
"$options": "i"
}
},
"include": [
"attachments"
],
"fields": [
"datasetName"
]
} It’s because lookup fields/embedded documents require the embedded field to be in the $project, otherwise it only returns the root doc. It’s kinda against the first instinct of how it supposed to work. For example, the general use case of $project is that if you don’t provide a projection, Mongo returns the entire doc. So, the idea behind this change is that we keep $project after $lookup because we know what lookup fields list is provided, what corresponding IDs are required, and that the required ids field will not be changed. This way people who don’t know Mongo query logic won’t get confused. |
That being said, I agree with your point. As a maintainer, changing the order is the simplest and cleanest solution. |
@Junjiequan Thanks for the explanation. Assuming we put the lookup pipeline step in the correct place, what we could add is that any filter under |
That sounds like a better solution. Just one additional thing needs to be clarified, if user provide the query below, there will be an error "include": [
"attachments"
],
"fields": [
"datasetName",
"attachments.thumbnail"
] Because @abdimo101 How do you think about this suggestion? |
497573e
to
48af15a
Compare
My two cents,
or
we know that what to do.
we can infer that they have forgot to ask for the attachments and internally we should insert
f course everything should be documented in the swagger endpoint, including the reference to the path collision mentioned by @Junjiequan |
Yes to avoid the path collision, we have to make a check to ensure that when a field within an embedded document(e.g |
…//github.com/SciCatProject/scicat-backend-next into fix-preserve-lookup-fields-for-relationships
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I overlooked this. I think now that the "scope" is supported in the dataset endpoint, a user would filter and project using that (see this test), so the relation fields in the parent I believe should not be supported (also because scope
enables more than that, e.g. filters and limits on relations). I think all findOneComplete
should simply reuse the findAllComplete
, see here. My solution started with datasest, I agree this concept should later be applied everywhere. I simpatise defaulting to include relations when in include but not in field, but then I think this is the only thing this PR should do, on top of the scopes and reusing findAllComplete
Hi @minottic, I had a discussion with @nitrosx & @Junjiequan about your changes. We concluded that the This becomes even more challenging when multiple relations are involved, as users would have to update several different
This approach requires users to handle multiple FIrst improvement:
And we also talked about implementing this format not in this PR, but in the future :
|
@abdimo101 fields will be both in scope and in the upper level in my (merged) PR, so the user will need a more complex json, but this will be way closer to the V3 implementation (and backward compatible with the old backend). But I simpatise with having the ability to add fields in the uppermost level as well, and I have to take back that it should not be supported. I think it would be a bit easier if we split the features.
btw, this
is already supported by this PR for datasets #2231. But if you mean here just making the filter syntax a bit easiear, that could work even though I am not sure it's worth asking users to change all their syntax when migrating from v3 to v4 for such a cosmetic change (you are essentially "stripping" the "scope" and renaming "relation" to "collection". For example, see how simple it becomes to extend to v3 here #2237 ). I agree it should be extended to other collections, but as I commented in this PR #2236 I thikn a more complete change is to refactor all find in all controllers to use a shared findAllComplete and findOneComplete that will reuse the pipelines. I will be happy to discuss this in a meeting next week |
Hi @minottic, thanks for the quick response. I pushed my last commit before seeing your response, but I’m open to discussing this further next week. |
@minottic I understand that you would like to maintain the same filter syntax in v4. |
…ling for path collision and a warning in swagger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I missed the refactor of the PR. Looks good to me, thanks for the change!
Description
Fixes dataset V4 findOne endpoint to preserve fields required for relationship lookups.
Motivation
When users request specific fields (e.g. only
datasetName
) while including relationships likeattachments
in their query, the system was filtering out required fields (likepid
) needed to establish these relationships.This resulted in empty relationship arrays being returned even when relationships existed in the database.
Fixes
Changes:
Tests included
Documentation
official documentation info