Skip to content

Conversation

@bharath-techie
Copy link
Contributor

@bharath-techie bharath-techie commented Oct 21, 2025

Description

  • Contains basic structure for common lib, data format plugin and datafusion plugin.
  • Datafusion context which contains the necessary details to execute a datafusion query and retrieve the results.
  • SearchOperations interface which tries to maintain compatibility and also plug-in the new engines. [WIP - should we go this route or have a separate path + interfaces for searcher / reader]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@bharath-techie bharath-techie force-pushed the search-df-main-interface branch 2 times, most recently from 478b56d to 94237cc Compare October 21, 2025 17:55
@bharath-techie bharath-techie force-pushed the search-df-main-interface branch from 94237cc to dc79a65 Compare October 21, 2025 17:57
@github-actions
Copy link
Contributor

❌ Gradle check result for dc79a65: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

*/
DataFormat getDataFormat();

CompletableFuture<Void> closeSessionContext(long sessionContextId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the API here seems off - why is dataformatcodec managing sessioncontext lifecycle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for this PR - there are no good examples, mainly today codec is only used for configuration and optimizers.
https://github.com/bharath-techie/OpenSearch/pull/47/files#diff-48735da23f1ac176775a474bd04cd70ec100c2e487d72fb9285e9146897ce043R153 - here is an example of how codec gets used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The close session context is not used , will remove it

* Represents a data format.
*/
@ExperimentalApi
public interface DataFormat {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've got two DataFormat interfaces here and in libs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19475 (comment) - facing this issue man - we might need to move dataFormatCodec to core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This i think is a bit tricky , if i move the codec to core, then pluggable engine specific stuff like session config etc also needs to be in core.

Do you have any suggestions ?

https://github.com/opensearch-project/OpenSearch/pull/19760/files - Kindly refer to this for what configuration details might be present for datafusion and parquet.

* Represents a stream of record batches from a DataFusion query execution.
* This interface provides access to query results in a streaming fashion.
*/
public interface RecordBatchStream extends AutoCloseable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move these components in libs under the DF plugin as thats the only place they are used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, initially I had usage in parquet plugin but moved everything out and somehow this stayed in libs, so will move this out too.

    CompletableFuture<RecordBatchStream> executeSubstraitQuery(long sessionContextId, byte[] substraitPlanBytes);

/**
* OpenSearch plugin that provides Parquet data format support for indexing and query operations.
*/
public class ParquetDataFormatPlugin extends Plugin implements DataFormatPlugin {
Copy link
Member

@mch2 mch2 Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm still struggling with the need for this plugin at a top level - can we not have dataformat pieces as a lib inside of engine specific plugins, ie engines provide indexer & searcher functionality directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right searcher and reader are completely in fact in engine-datafusion , so we are already not using the format plugins for the same. As commented above, only configuration and optimizers , i expect to be in data format plugins.

But indexer / writer is a complete piece that sits in parquet plugin as its specific to parquet and agnostic of read engines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants