Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Semantic Text] Integration Test #125141

Conversation

Samiul-TheSoccerFan
Copy link
Contributor

@Samiul-TheSoccerFan Samiul-TheSoccerFan commented Mar 18, 2025

An integration test to:

  • Select a set of index versions randomly from the available versions.
  • Creates an index for each selected index version
  • Ingests data
  • Performs semantic searches (including highlighting when applicable)

@Samiul-TheSoccerFan Samiul-TheSoccerFan added >test Issues or PRs that are addressing/adding tests v8.19.0 labels Mar 18, 2025
@Samiul-TheSoccerFan Samiul-TheSoccerFan force-pushed the index-version-compatibility-IT branch from 3cb2949 to 8332e8d Compare March 24, 2025 22:08
@Samiul-TheSoccerFan Samiul-TheSoccerFan marked this pull request as ready for review March 24, 2025 22:08
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 24, 2025
@Samiul-TheSoccerFan Samiul-TheSoccerFan added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :SearchOrg/Relevance Label for the Search (solution/org) Relevance team and removed needs:triage Requires assignment of a team area label Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Mar 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@Samiul-TheSoccerFan Samiul-TheSoccerFan added the auto-backport Automatically create backport pull requests when merged label Mar 24, 2025
@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

import static org.hamcrest.Matchers.equalTo;

public class SemanticTextIndexVersionIT extends ESIntegTestCase {
private static final IndexVersion SEMANTIC_TEXT_INTRODUCED_VERSION = IndexVersion.fromId(8512000);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This index version starts from 8.15. If we use the SEMANTIC_TEXT_FIELD_TYPE, the index version starts from 8.14.4/5 and fails to create the semantic_text field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried starting from SEMANTIC_TEXT_FIELD_TYPE and the test passes. Perhaps the test failure was related to other problems that you've now fixed?

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start to this, I left some comments on how to move it forward

@kderusso
Copy link
Member

Interesting, we should follow up on this. It's strange that no plugin registers SparseVectorQueryBuilder as a named writeable, yet your test requires it.

@Mikep86 @Samiul-TheSoccerFan the query is registered in XPackClientPlugin did you try pulling it from there?

@Mikep86
Copy link
Contributor

Mikep86 commented Mar 27, 2025

@kderusso

the query is registered in XPackClientPlugin did you try pulling it from there?

XPackClientPlugin only registers it as a query, not a named writeable. It shouldn't be necessary to register it as a named writeable because rewriting to sparse_vector happens on the shard.

We debugged the issue a bit and the serialization happens when generating the search response in the test, after the query execution is complete. Adding to the mystery, this serialization does not happen when executing a semantic query in production. So the test is following a different code path that we need to understand.

@Mikep86
Copy link
Contributor

Mikep86 commented Mar 27, 2025

@Samiul-TheSoccerFan @kderusso Figured out what's going on with SparseVectorQueryBuilder named writeable. The query goes through a serialization/deserialization cycle as part of creating the search response (both in prod and test code), but only in a multi-node cluster. The test fails when it (randomly) uses a multi-node cluster because SparseVectorQueryBuilder is not registered as a named writeable. But why not?

Queries exposed via getQueries are registered as queries and named writeables. SparseVectorQueryBuilder is registered as a query by XPackClientPlugin. LocalStateInferencePlugin extends XPackClientPlugin, but we have overridden getQueries, preventing SparseVectorQueryBuilder from being registered. It doesn't work to add XPackClientPlugin as a separate plugin dependency since that would result in double inclusion of elements that aren't overridden by LocalStateInferencePlugin (it's also a messier solution). Instead, the correct solution is to ensure that LocalStateInferencePlugin.getQueries registers SparseVectorQueryBuilder.

IMO the way to do this is to add the following to LocalStateCompositeXPackPlugin:

    @Override
    public List<QuerySpec<?>> getQueries() {
        List<QuerySpec<?>> querySpecs = new ArrayList<>(super.getQueries()); // The xpack plugin also registers queries, include those
        filterPlugins(SearchPlugin.class).stream().flatMap(p -> p.getQueries().stream()).forEach(querySpecs::add);

        return querySpecs;
    }

This should handle the case for every plugin that extends LocalStateCompositeXPackPlugin, saving us some future headaches.

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much better, two small things to adjust and I think we're good :)

import static org.hamcrest.Matchers.equalTo;

public class SemanticTextIndexVersionIT extends ESIntegTestCase {
private static final IndexVersion SEMANTIC_TEXT_INTRODUCED_VERSION = IndexVersion.fromId(8512000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried starting from SEMANTIC_TEXT_FIELD_TYPE and the test passes. Perhaps the test failure was related to other problems that you've now fixed?

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

@Mikep86
Copy link
Contributor

Mikep86 commented Mar 28, 2025

@elasticmachine update branch

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Very nice work, and thanks for all of the iterations! 👏

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was waiting for green CI, but it looks like that's hosed for unrelated reasons. Good work!

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

@Samiul-TheSoccerFan
Copy link
Contributor Author

@elasticmachine update branch

@Samiul-TheSoccerFan Samiul-TheSoccerFan merged commit d8ae61e into elastic:main Apr 1, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

Samiul-TheSoccerFan added a commit to Samiul-TheSoccerFan/elasticsearch that referenced this pull request Apr 1, 2025
* Initial draft test with index version  setup

* Adding test in phases

* [CI] Auto commit changes from spotless

* Adding test for search functionality

* Adding test for highlighting

* Adding randomization during selection process

* Fix code styles by running spotlessApply

* Fix code styles by running spotlessApply

* Fixing forbiddenAPIcall issue

* Decoupled namedWritables to use separate fake plugin and simplified other override methods

* Updating settings string to variable and removed unused code

* Fix SemanticQueryBuilder dependencies

* fix setting maximum number of tests to run

* utilizing semantci_text index version param and removed unwanted override

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
elasticsearchmachine added a commit that referenced this pull request Apr 1, 2025
* [Semantic Text] Integration Test (#125141)

* Initial draft test with index version  setup

* Adding test in phases

* [CI] Auto commit changes from spotless

* Adding test for search functionality

* Adding test for highlighting

* Adding randomization during selection process

* Fix code styles by running spotlessApply

* Fix code styles by running spotlessApply

* Fixing forbiddenAPIcall issue

* Decoupled namedWritables to use separate fake plugin and simplified other override methods

* Updating settings string to variable and removed unused code

* Fix SemanticQueryBuilder dependencies

* fix setting maximum number of tests to run

* utilizing semantci_text index version param and removed unwanted override

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>

* Removing time value field for 8.*

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :SearchOrg/Relevance Label for the Search (solution/org) Relevance team >test Issues or PRs that are addressing/adding tests v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants