Skip to content

Conversation

@cglukas
Copy link

@cglukas cglukas commented Sep 15, 2025

Purpose

The built-in search is capable of excluding search terms. Thats a great feature which would make the search a lot better!
Unfortunately, there are two blocking components:

  • The splitQuery will discard hyphens which define the excluded terms
  • The performTermsSearch will abort the search if any excluded term is matched

References

Closes #13892

@cglukas
Copy link
Author

cglukas commented Sep 15, 2025

Regarding the CI: I don't see any changes on my PR which would trigger the current CI fail. Is this a common issue? TBH it does not look like it's affecting the master branch too 🤔. I'm a little aimless what to do here.

@jayaddison
Copy link
Contributor

Regarding the CI: I don't see any changes on my PR which would trigger the current CI fail. Is this a common issue? TBH it does not look like it's affecting the master branch too 🤔. I'm a little aimless what to do here.

That's OK, yep - I believe that is due to bug #13886 (in progress, potentially to be fixed by #13883).

@jayaddison
Copy link
Contributor

A delayed thought here: adding the exclusion operator to hyphenated query terms could cause unexpected results.

For example, the query example -test-case currently parses to ["example", "-test", "case"], I think.

Comment on lines +643 to +652
[...excludedTerms].some((excludedTerm) => {
// Both mappings will contain either a single integer or a list of integers.
// Converting them to lists makes the comparison more readable.
let excludedTermFiles = [].concat(terms[excludedTerm]);
let excludedTitleFiles = [].concat(titleTerms[excludedTerm]);
return (
excludedTermFiles.includes(file)
|| excludedTitleFiles.includes(file)
);
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[...excludedTerms].some((excludedTerm) => {
// Both mappings will contain either a single integer or a list of integers.
// Converting them to lists makes the comparison more readable.
let excludedTermFiles = [].concat(terms[excludedTerm]);
let excludedTitleFiles = [].concat(titleTerms[excludedTerm]);
return (
excludedTermFiles.includes(file)
|| excludedTitleFiles.includes(file)
);
})
[...excludedTerms].some(
(term) =>
terms[term] === file
|| titleTerms[term] === file
|| (terms[term] || []).includes(file)
|| (titleTerms[term] || []).includes(file),
)

Do we need this set of excludedTerms filtering changes? I would suggest not modifying these lines unless strictly necessary. Tests continue to pass when I revert this.

I acknowledge that the break to continue fixup is important though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll need to add another test then. The last conditions raise an error in some cases. I'll add it soon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jayaddison,
I added the "should exclude results where the file index is above 0." test case which raises an TypeError with the old code. It's very specific for the old implementation.
The error occurs every time the terms[term] statement is a positive integer and not equal to the file variable. I think this bug was introduced when the datastructures were changed from actual file paths/names to file ids.
The strings usually evaluate to false in this condition, but integers above 0 evaluate to true.

cglukas and others added 4 commits November 2, 2025 14:53
No change in functionality intended.
Only update the automatically generated files which might be forgotten.
Use lookbehind to split words only if a hyphon is contained inside a word.

Co-authored-by: James Addison <[email protected]>
@cglukas
Copy link
Author

cglukas commented Nov 2, 2025

A delayed thought here: adding the exclusion operator to hyphenated query terms could cause unexpected results.

For example, the query example -test-case currently parses to ["example", "-test", "case"], I think.

Hi @jayaddison,
that's a valid concern. I can imagine two scenarios:

  1. The case word gets excluded as well.

    • ✅ Benefit: This would work without a major change in the datastructures.
    • ❌Downside: If we try to find a page with the words "example" and "case" because "case" is excluded even though we never wanted to exclude all "case" occurrences.
  2. The excluded word would be test-case.

    • ✅ Benefit: We can actually exclude "test-case" and the limitation from above is fixed.
    • ❌ Downside: I think we either need to modify the searchindex generation for that because hyphenated words are not stored. This would probably introduce a lot of unwanted changes. Or we add a mechanism to chain excluded words together. Right now they are all chained with an OR condition. To get test and case to work, we would need to add a AND condition and an additional attribute that defines this for each excluded word (could also be a mapping).

Altogether, I think that this is a very valid concern. Still, I would not address this in this MR but rather open another one to get the bugfix out quickly 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Word exclusion breaks search

3 participants