Skip to content

Conversation

Samiul-TheSoccerFan
Copy link
Contributor

@Samiul-TheSoccerFan Samiul-TheSoccerFan commented Mar 18, 2025

An integration test to:

  • Select a set of index versions randomly from the available versions.
  • Creates an index for each selected index version
  • Ingests data
  • Performs semantic searches (including highlighting when applicable)

@Samiul-TheSoccerFan Samiul-TheSoccerFan added >test Issues or PRs that are addressing/adding tests v8.19.0 labels Mar 18, 2025
@Samiul-TheSoccerFan Samiul-TheSoccerFan force-pushed the index-version-compatibility-IT branch from 3cb2949 to 8332e8d Compare March 24, 2025 22:08
@Samiul-TheSoccerFan Samiul-TheSoccerFan marked this pull request as ready for review March 24, 2025 22:08
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 24, 2025
@Samiul-TheSoccerFan Samiul-TheSoccerFan added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :SearchOrg/Relevance Label for the Search (solution/org) Relevance team and removed needs:triage Requires assignment of a team area label Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Mar 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@Samiul-TheSoccerFan Samiul-TheSoccerFan added the auto-backport Automatically create backport pull requests when merged label Mar 24, 2025
@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

import static org.hamcrest.Matchers.equalTo;

public class SemanticTextIndexVersionIT extends ESIntegTestCase {
private static final IndexVersion SEMANTIC_TEXT_INTRODUCED_VERSION = IndexVersion.fromId(8512000);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This index version starts from 8.15. If we use the SEMANTIC_TEXT_FIELD_TYPE, the index version starts from 8.14.4/5 and fails to create the semantic_text field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried starting from SEMANTIC_TEXT_FIELD_TYPE and the test passes. Perhaps the test failure was related to other problems that you've now fixed?

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start to this, I left some comments on how to move it forward

@kderusso
Copy link
Member

Interesting, we should follow up on this. It's strange that no plugin registers SparseVectorQueryBuilder as a named writeable, yet your test requires it.

@Mikep86 @Samiul-TheSoccerFan the query is registered in XPackClientPlugin did you try pulling it from there?

@Mikep86
Copy link
Contributor

Mikep86 commented Mar 27, 2025

@kderusso

the query is registered in XPackClientPlugin did you try pulling it from there?

XPackClientPlugin only registers it as a query, not a named writeable. It shouldn't be necessary to register it as a named writeable because rewriting to sparse_vector happens on the shard.

We debugged the issue a bit and the serialization happens when generating the search response in the test, after the query execution is complete. Adding to the mystery, this serialization does not happen when executing a semantic query in production. So the test is following a different code path that we need to understand.

@Mikep86
Copy link
Contributor

Mikep86 commented Mar 27, 2025

@Samiul-TheSoccerFan @kderusso Figured out what's going on with SparseVectorQueryBuilder named writeable. The query goes through a serialization/deserialization cycle as part of creating the search response (both in prod and test code), but only in a multi-node cluster. The test fails when it (randomly) uses a multi-node cluster because SparseVectorQueryBuilder is not registered as a named writeable. But why not?

Queries exposed via getQueries are registered as queries and named writeables. SparseVectorQueryBuilder is registered as a query by XPackClientPlugin. LocalStateInferencePlugin extends XPackClientPlugin, but we have overridden getQueries, preventing SparseVectorQueryBuilder from being registered. It doesn't work to add XPackClientPlugin as a separate plugin dependency since that would result in double inclusion of elements that aren't overridden by LocalStateInferencePlugin (it's also a messier solution). Instead, the correct solution is to ensure that LocalStateInferencePlugin.getQueries registers SparseVectorQueryBuilder.

IMO the way to do this is to add the following to LocalStateCompositeXPackPlugin:

    @Override
    public List<QuerySpec<?>> getQueries() {
        List<QuerySpec<?>> querySpecs = new ArrayList<>(super.getQueries()); // The xpack plugin also registers queries, include those
        filterPlugins(SearchPlugin.class).stream().flatMap(p -> p.getQueries().stream()).forEach(querySpecs::add);

        return querySpecs;
    }

This should handle the case for every plugin that extends LocalStateCompositeXPackPlugin, saving us some future headaches.

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much better, two small things to adjust and I think we're good :)

import static org.hamcrest.Matchers.equalTo;

public class SemanticTextIndexVersionIT extends ESIntegTestCase {
private static final IndexVersion SEMANTIC_TEXT_INTRODUCED_VERSION = IndexVersion.fromId(8512000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried starting from SEMANTIC_TEXT_FIELD_TYPE and the test passes. Perhaps the test failure was related to other problems that you've now fixed?

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

@Mikep86
Copy link
Contributor

Mikep86 commented Mar 28, 2025

@elasticmachine update branch

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Very nice work, and thanks for all of the iterations! 👏

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was waiting for green CI, but it looks like that's hosed for unrelated reasons. Good work!

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

@Samiul-TheSoccerFan Samiul-TheSoccerFan merged commit d8ae61e into elastic:main Apr 1, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

Samiul-TheSoccerFan added a commit to Samiul-TheSoccerFan/elasticsearch that referenced this pull request Apr 1, 2025
* Initial draft test with index version  setup

* Adding test in phases

* [CI] Auto commit changes from spotless

* Adding test for search functionality

* Adding test for highlighting

* Adding randomization during selection process

* Fix code styles by running spotlessApply

* Fix code styles by running spotlessApply

* Fixing forbiddenAPIcall issue

* Decoupled namedWritables to use separate fake plugin and simplified other override methods

* Updating settings string to variable and removed unused code

* Fix SemanticQueryBuilder dependencies

* fix setting maximum number of tests to run

* utilizing semantci_text index version param and removed unwanted override

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Apr 1, 2025
* [Semantic Text] Integration Test (#125141)

* Initial draft test with index version  setup

* Adding test in phases

* [CI] Auto commit changes from spotless

* Adding test for search functionality

* Adding test for highlighting

* Adding randomization during selection process

* Fix code styles by running spotlessApply

* Fix code styles by running spotlessApply

* Fixing forbiddenAPIcall issue

* Decoupled namedWritables to use separate fake plugin and simplified other override methods

* Updating settings string to variable and removed unused code

* Fix SemanticQueryBuilder dependencies

* fix setting maximum number of tests to run

* utilizing semantci_text index version param and removed unwanted override

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>

* Removing time value field for 8.*

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :SearchOrg/Relevance Label for the Search (solution/org) Relevance team >test Issues or PRs that are addressing/adding tests v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants