|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Making search smarter: System generated search pipelines" |
| 4 | +layout: post |
| 5 | +authors: |
| 6 | + - bzhangam |
| 7 | +date: 2025-10-24 |
| 8 | +has_science_table: true |
| 9 | +categories: |
| 10 | + - technical-posts |
| 11 | +meta_keywords: system generated search pipeline, OpenSearch 3.3, plugin development, system generated pipeline, system generated search processor |
| 12 | +meta_description: Learn how OpenSearch automatically generates and executes search processors during query evaluation, how plugin developers can extend this mechanism, and how to monitor system-generated processors using the Search Pipeline Stats API. |
| 13 | +--- |
| 14 | + |
| 15 | +## Making search smarter: system-generated search pipelines in OpenSearch |
| 16 | + |
| 17 | +In 3.3 OpenSearch introduces the system-generated search pipeline, a new capability designed for plugin developers. It lets OpenSearch automatically process search requests by generating and attaching system search processors at runtime, based on the request context and parameters. |
| 18 | + |
| 19 | +This capability enables plugin developers to embed search-time processing logic directly into their plugins—without requiring users to manually create or configure search pipelines. It simplifies integration and creates a smoother, more intelligent search experience out of the box. |
| 20 | + |
| 21 | +Previously, if plugin developers built a custom search processor, users needed to explicitly configure a search pipeline that included the processor and then reference it in their queries. With system-generated search pipelines, OpenSearch can automatically generate and manage these processors, reducing manual setup while maintaining full compatibility with user-defined pipelines. |
| 22 | + |
| 23 | +## System-generated search pipeline compared to standard search pipeline |
| 24 | + |
| 25 | +In OpenSearch, a standard search pipeline is defined using the [Search Pipeline API](https://docs.opensearch.org/latest/search-plugins/search-pipelines/index/) |
| 26 | +. Users must manually configure and reference these pipelines in their search requests. |
| 27 | + |
| 28 | +A system-generated search pipeline works similarly—it executes one or more processors during the search request lifecycle—but users do not configure it manually. Instead, OpenSearch automatically generates the pipeline at query time, based on the registered system processor factories in your plugin and the details of the incoming request. |
| 29 | + |
| 30 | +The following table summarizes the key differences between standard and system-generated search pipelines in OpenSearch. |
| 31 | + |
| 32 | +| **Pipeline type** | **How it's defined** | **How it's triggered** | **How to disable it** | |
| 33 | +| ------------------------------------ | ----------------------------------------------------------------------------------------------------------- |--------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------| |
| 34 | +| **Standard search pipeline** | Manually defined by users using the Search Pipeline API | Referenced by name in a search request or set as the default search pipeline in cluster settings | Remove the pipeline reference from search requests or clear default pipeline settings | |
| 35 | +| **System-generated search pipeline** | Automatically generated by OpenSearch based on request evaluation and plugin-registered processor factories | Triggered automatically when the search request matches criteria defined by the system generated processor factory | All system generated search processor factories are disabled by default. Users need to enable them in the cluster setting `cluster.search.enabled_system_generated_factories` before using it. | |
| 36 | + |
| 37 | + |
| 38 | +## How it works |
| 39 | + |
| 40 | +When OpenSearch receives a search request, it evaluates the request parameters and context to determine whether to generate system search processors. These processors can be inserted at different stages of the search lifecycle: |
| 41 | + |
| 42 | + - **System Generated Search request processors** modify or enrich the incoming request before execution. |
| 43 | + |
| 44 | + - **System Generated Search phase results processors** operate after shard-level results are collected, allowing aggregation or transformation of intermediate results. |
| 45 | + |
| 46 | + - **System Generated Search response processors** modify the final response before it is returned to the client. |
| 47 | + |
| 48 | +During this evaluation, OpenSearch dynamically constructs a system-generated search pipeline and merges it with any user-defined pipeline specified in the request. System-generated processors are created only when the request meets specific criteria defined in your plugin’s factory implementation—for example, when a query includes certain parameters, or when a specific search type (such as neural or knn) is detected. |
| 49 | + |
| 50 | +The following diagram illustrates how OpenSearch resolves system-generated search pipelines during query execution. |
| 51 | + |
| 52 | + |
| 53 | + |
| 54 | +OpenSearch automatically manages execution order, ensuring that system-generated processors run in the correct phase and relative position to any user-defined processors. This ensures compatibility and predictable execution without additional configuration from users. |
| 55 | + |
| 56 | +The following diagram illustrates how OpenSearch execute system-generated search request processors during query execution. Similar pattern is applied to the system generated search phase results processors and search response processors. |
| 57 | + |
| 58 | + |
| 59 | + |
| 60 | +## Building a custom system generated search processor |
| 61 | + |
| 62 | +Plugin developers can define custom system-generated search processors in the plugin. To do this, you’ll need to: |
| 63 | + |
| 64 | + - **Create a system search processor:** Implement the processor logic by extending one of the search processor interfaces (such as SearchRequestProcessor, SearchPhaseResultProcessor or SearchResponseProcessor). |
| 65 | + |
| 66 | + - **Create a processor factory:** Implement a factory that determines when OpenSearch should generate and attach the processor. |
| 67 | + |
| 68 | + - **Register the factory:** Register your factory with the OpenSearch plugin so it can participate in automatic pipeline generation. |
| 69 | + |
| 70 | +Follow these steps to build a simple example system generated search request processor. |
| 71 | + |
| 72 | +### Step 1: Create a system search processor |
| 73 | +```java |
| 74 | +/** |
| 75 | + * An example system generated search request processor that will be executed before the user defined processor |
| 76 | + */ |
| 77 | +public class ExampleSearchRequestPostProcessor implements SearchRequestProcessor, SystemGeneratedProcessor { |
| 78 | + /** |
| 79 | + * type of the processor |
| 80 | + */ |
| 81 | + public static final String TYPE = "example-search-request-post-processor"; |
| 82 | + /** |
| 83 | + * description of the processor |
| 84 | + */ |
| 85 | + public static final String DESCRIPTION = "This is a system generated search request processor which will be" |
| 86 | + + "executed after the user defined search request. It will increase the query size by 2."; |
| 87 | + private final String tag; |
| 88 | + private final boolean ignoreFailure; |
| 89 | + |
| 90 | + /** |
| 91 | + * ExampleSearchRequestPostProcessor constructor |
| 92 | + * @param tag tag of the processor |
| 93 | + * @param ignoreFailure should processor ignore the failure |
| 94 | + */ |
| 95 | + public ExampleSearchRequestPostProcessor(String tag, boolean ignoreFailure) { |
| 96 | + this.tag = tag; |
| 97 | + this.ignoreFailure = ignoreFailure; |
| 98 | + } |
| 99 | + |
| 100 | + @Override |
| 101 | + public SearchRequest processRequest(SearchRequest request) { |
| 102 | + if (request == null || request.source() == null) { |
| 103 | + return request; |
| 104 | + } |
| 105 | + int size = request.source().size(); |
| 106 | + request.source().size(size + 2); |
| 107 | + return request; |
| 108 | + } |
| 109 | + |
| 110 | + @Override |
| 111 | + public String getType() { |
| 112 | + return TYPE; |
| 113 | + } |
| 114 | + |
| 115 | + @Override |
| 116 | + public String getTag() { |
| 117 | + return this.tag; |
| 118 | + } |
| 119 | + |
| 120 | + @Override |
| 121 | + public String getDescription() { |
| 122 | + return DESCRIPTION; |
| 123 | + } |
| 124 | + |
| 125 | + @Override |
| 126 | + public boolean isIgnoreFailure() { |
| 127 | + return this.ignoreFailure; |
| 128 | + } |
| 129 | + |
| 130 | + @Override |
| 131 | + public ExecutionStage getExecutionStage() { |
| 132 | + // This processor will be executed after the user defined search request processor |
| 133 | + return ExecutionStage.POST_USER_DEFINED; |
| 134 | + } |
| 135 | +} |
| 136 | + |
| 137 | +``` |
| 138 | +### Step 2: Create a processor factory |
| 139 | +```java |
| 140 | + public class Factory implements SystemGeneratedFactory<SearchRequestProcessor> { |
| 141 | + public static final String TYPE = "example-search-request-post-processor-factory"; |
| 142 | + |
| 143 | + // We auto generate the processor if the original query size is less than 5. |
| 144 | + @Override |
| 145 | + public boolean shouldGenerate(ProcessorGenerationContext context) { |
| 146 | + SearchRequest searchRequest = context.searchRequest(); |
| 147 | + if (searchRequest == null || searchRequest.source() == null) { |
| 148 | + return false; |
| 149 | + } |
| 150 | + int size = searchRequest.source().size(); |
| 151 | + return size < 5; |
| 152 | + } |
| 153 | + |
| 154 | + @Override |
| 155 | + public SearchRequestProcessor create( |
| 156 | + Map<String, Processor.Factory<SearchRequestProcessor>> processorFactories, |
| 157 | + String tag, |
| 158 | + String description, |
| 159 | + boolean ignoreFailure, |
| 160 | + Map<String, Object> config, |
| 161 | + PipelineContext pipelineContext |
| 162 | + ) throws Exception { |
| 163 | + return new ExampleSearchRequestPostProcessor(tag, ignoreFailure); |
| 164 | + } |
| 165 | + } |
| 166 | +``` |
| 167 | + |
| 168 | +The `shouldGenerate()` method is called for every search request. Avoid performing any time-consuming or resource-intensive logic in this method. It should remain lightweight — its sole purpose is to quickly decide whether a processor needs to be generated. |
| 169 | +### Step 3: Register the factory in your plugin |
| 170 | +```java |
| 171 | + @Override |
| 172 | + public Map<String, SystemGeneratedProcessor.SystemGeneratedFactory<SearchRequestProcessor>> getSystemGeneratedRequestProcessors( |
| 173 | + Parameters parameters |
| 174 | + ) { |
| 175 | + return Map.of( |
| 176 | + ExampleSearchRequestPostProcessor.Factory.TYPE, |
| 177 | + new ExampleSearchRequestPostProcessor.Factory() |
| 178 | + ); |
| 179 | + } |
| 180 | +``` |
| 181 | +Once registered, OpenSearch will automatically evaluate incoming search requests, generate system processors where applicable, and insert them into the runtime search pipeline. You also can check this [example plugin](https://github.com/opensearch-project/OpenSearch/tree/main/plugins/examples/system-search-processor/src/main/java/org/opensearch/example/systemsearchprocessor) to see more examples. |
| 182 | + |
| 183 | +Currently, OpenSearch allows only one system-generated search processor per type and stage for each search request. For example, only one system-generated search request processor can run before user-defined processors. This design simplifies execution order management and ensures predictable behavior across different plugins. |
| 184 | + |
| 185 | +In most cases, a single processor per type and stage is sufficient, but future releases may support multiple processors if use cases arise. |
| 186 | + |
| 187 | +You can also add logic in your processor to detect and handle conflicts between your system-generated processors and user-defined processors. This is useful if your processor cannot coexist with certain user-defined ones or if you need to enforce execution constraints. |
| 188 | + |
| 189 | +Below is an example to handle a conflict between a system generated search processor and a user defined search processor. |
| 190 | +```java |
| 191 | + @Override |
| 192 | + public void evaluateConflicts(ProcessorConflictEvaluationContext context) throws IllegalArgumentException { |
| 193 | + boolean hasTruncateHitsProcessor = context.getUserDefinedSearchResponseProcessors() |
| 194 | + .stream() |
| 195 | + .anyMatch(processor -> CONFLICT_PROCESSOR_TYPE.equals(processor.getType())); |
| 196 | + |
| 197 | + if (hasTruncateHitsProcessor) { |
| 198 | + throw new IllegalArgumentException( |
| 199 | + String.format( |
| 200 | + Locale.ROOT, |
| 201 | + "The [%s] processor cannot be used in a search pipeline because it conflicts with the [%s] processor, " |
| 202 | + + "which is automatically generated when executing a match query against [%s].", |
| 203 | + CONFLICT_PROCESSOR_TYPE, |
| 204 | + TYPE, |
| 205 | + TRIGGER_FIELD |
| 206 | + ) |
| 207 | + ); |
| 208 | + } |
| 209 | + } |
| 210 | +``` |
| 211 | + |
| 212 | +## Monitoring system-generated search processors |
| 213 | +OpenSearch provides the Search Pipeline Stats API to help developers monitor performance and execution metrics for both user-defined and system-generated processors. |
| 214 | + |
| 215 | +You can access these metrics using the following command: |
| 216 | + |
| 217 | +```json |
| 218 | +GET /_nodes/stats/search_pipeline |
| 219 | +``` |
| 220 | + |
| 221 | +The response includes a system_generated_processors section that reports statistics for each processor type, as well as a system_generated_factories section that reports evaluation and generation metrics for each processor factory. |
| 222 | +e.g |
| 223 | +```json |
| 224 | +{ |
| 225 | + "nodes": { |
| 226 | + "gv8NncXIRiSaA7egwHzfJg": { |
| 227 | + "search_pipeline": { |
| 228 | + "system_generated_processors": { |
| 229 | + "request_processors": [ |
| 230 | + { |
| 231 | + "example-search-request-post-processor": { |
| 232 | + "type": "mmr-search-request-processor", |
| 233 | + "stats": { |
| 234 | + "count": 13, |
| 235 | + "time_in_millis": 1, |
| 236 | + "failed": 0 |
| 237 | + } |
| 238 | + } |
| 239 | + } |
| 240 | + ] |
| 241 | + }, |
| 242 | + "system_generated_factories": { |
| 243 | + "request_processor_factories": [ |
| 244 | + { |
| 245 | + "example-search-request-post-processor-factory": { |
| 246 | + "type": "example-search-request-post-processor-factory", |
| 247 | + "evaluation_stats": { |
| 248 | + "count": 37, |
| 249 | + "time_in_microseconds": 185, |
| 250 | + "failed": 0 |
| 251 | + }, |
| 252 | + "generation_stats": { |
| 253 | + "count": 13, |
| 254 | + "time_in_microseconds": 1, |
| 255 | + "failed": 0 |
| 256 | + } |
| 257 | + } |
| 258 | + } |
| 259 | + ] |
| 260 | + } |
| 261 | + } |
| 262 | + } |
| 263 | + } |
| 264 | +} |
| 265 | + |
| 266 | +``` |
| 267 | +The `system_generated_factories` section reports how many times OpenSearch evaluated and generated processors: |
| 268 | + |
| 269 | + - `evaluation_stats` shows how many search requests were evaluated by the factory to decide whether a processor should be generated. |
| 270 | + |
| 271 | + - `generation_stats` shows how many times a processor was actually created and how much time was spent generating it. |
| 272 | + |
| 273 | +These metrics make it easy to determine whether your system-generated processors are behaving as expected and to identify potential performance bottlenecks. |
| 274 | + |
| 275 | +## Summary |
| 276 | + |
| 277 | +System-generated search pipelines extend OpenSearch’s search framework by allowing automatic generation and execution of search processors based on request context. This simplifies plugin development, eliminates the need for manual configuration, and makes search smarter and more adaptive. |
| 278 | + |
| 279 | +Plugin developers can use this capability to embed custom logic that runs automatically—such as reranking, result diversification, or query enrichment—without requiring users to define search pipelines manually. |
0 commit comments