Improving vector search diversity through native MMR

bzhangam · bzhangam · commit 9ab83736d1fa · 2025-10-08T16:18:08.000-07:00
Signed-off-by: Bo Zhang &lt;bzhangam@amazon.com&gt;
diff --git a/_posts/2025-10-24-Improving-vector-search-diversity-through-native-MMR.markdown b/_posts/2025-10-24-Improving-vector-search-diversity-through-native-MMR.markdown
@@ -0,0 +1,240 @@
+---
+layout: post
+title: "Improving vector search diversity through native MMR"
+layout: post
+authors:
+   - bzhangam
+date: 2025-10-24
+has_science_table: true
+categories:
+   - technical-posts
+meta_keywords: MMR, Maximal Marginal Relevance, search diversity, search ranking, OpenSearch 3.3, vector search
+meta_description: Learn how to use Maximal Marginal Relevance (MMR) in OpenSearch to make your search results more diverse.
+---
+
+## Improving vector search diversity through native MMR
+
+When it comes to search and recommendation systems, returning highly relevant results is only half the battle. Equally important is diversity — ensuring users see a range of results rather than multiple near-duplicates. OpenSearch 3.3 now supports native Maximal Marginal Relevance (MMR) for k-NN/neural queries makes this easy.
+
+## What is MMR?
+
+Maximal Marginal Relevance (MMR) is a re-ranking algorithm that balances relevance and diversity:
+
+ - **Relevance:** How well a result matches the query.
+
+ - **Diversity:** How different the results are from each other.
+
+MMR iteratively selects results that are relevant to the query and not too similar to previously selected results. The trade-off is controlled by a diversity parameter (0 = prioritize relevance, 1 = prioritize diversity).
+
+In vector search, this is particularly useful because embeddings often cluster similar results together. Without MMR, the top-k results might all look nearly identical.
+
+## Native MMR in OpenSearch
+
+Previously, MMR could only be implemented externally, requiring custom pipelines and extra coding. Now, OpenSearch supports native MMR directly in k-NN and neural queries using knn_vector. This simplifies your setup and reduces latency.
+
+## How to Use MMR
+
+### Pre-Requisites
+Before using Maximal Marginal Relevance (MMR) for reranking, make sure the required [system-generated search processor factories](https://docs.opensearch.org/latest/search-plugins/search-pipelines/system-generated-search-processors/) are enabled in your cluster:
+
+```json
+PUT _cluster/settings
+{
+  "persistent": {
+    "cluster.search.enabled_system_generated_factories": [
+      "mmr_over_sample_factory",
+      "mmr_rerank_factory"
+    ]
+  }
+}
+```
+These factories enable OpenSearch to automatically perform the oversampling and reranking steps needed for MMR.
+
+### Example: Improving Diversity in Neural Search
+
+Suppose we have a neural search index with a semantic field for product descriptions using a dense embedding model. You can set up your index following this [guide](https://docs.opensearch.org/latest/field-types/supported-field-types/semantic/).
+
+#### Index Sample Data
+
+We index a few example product descriptions:
+
+```json
+PUT /_bulk
+
+{ "update": { "_index": "my-nlp-index", "_id": "1" } }
+{ "doc": {"product_description": "Red apple from USA."}, "doc_as_upsert": true }
+
+{ "update": { "_index": "my-nlp-index", "_id": "2" } }
+{ "doc": {"product_description": "Red apple from usa."}, "doc_as_upsert": true }
+
+{ "update": { "_index": "my-nlp-index", "_id": "3" } }
+{ "doc": {"product_description": "Crispy apple."}, "doc_as_upsert": true }
+
+{ "update": { "_index": "my-nlp-index", "_id": "4" } }
+{ "doc": {"product_description": "Red apple."}, "doc_as_upsert": true }
+
+{ "update": { "_index": "my-nlp-index", "_id": "5" } }
+{ "doc": {"product_description": "Orange juice from usa."}, "doc_as_upsert": true }
+```
+
+#### Query Without MMR
+
+A standard neural search query for "Red apple" might look like this:
+```json
+GET /my-npl-index/_search
+{
+  "size": 3,
+  "_source": { "exclude": ["product_description_semantic_info"] },
+  "query": {
+    "neural": {
+      "product_description": { "query_text": "Red apple" }
+    }
+  }
+}
+```
+Results:
+
+```json
+"hits": [
+    { "_id": "4", "_score": 0.956, "_source": {"product_description": "Red apple."} },
+    { "_id": "1", "_score": 0.743, "_source": {"product_description": "Red apple from USA."} },
+    { "_id": "2", "_score": 0.743, "_source": {"product_description": "Red apple from usa."} }
+]
+```
+Notice how all top results are very similar — there’s little diversity in what the user sees.
+
+#### Query With MMR
+
+By adding MMR, we can diversify the top results while maintaining relevance:
+```json
+GET /my-npl-index/_search
+{
+  "size": 3,
+  "_source": { "exclude": ["product_description_semantic_info"] },
+  "query": {
+    "neural": {
+      "product_description": { "query_text": "Red apple" }
+    }
+  },
+  "ext": {
+    "mmr": {
+      "candidates": 10,
+      "diversity": 0.4
+    }
+  }
+}
+
+```
+
+Results:
+```json
+"hits": [
+    { "_id": "4", "_score": 0.956, "_source": {"product_description": "Red apple."} },
+    { "_id": "1", "_score": 0.743, "_source": {"product_description": "Red apple from USA."} },
+    { "_id": "3", "_score": 0.611, "_source": {"product_description": "Crispy apple."} }
+]
+```
+
+By using MMR, we introduce more diverse results (like “Crispy apple”) without sacrificing relevance for the top hits.
+
+## Benchmarking MMR Reranking in OpenSearch
+To evaluate the performance impact of Maximal Marginal Relevance (MMR) reranking, we ran benchmark tests on OpenSearch 3.3 across both [vector search](https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/vectorsearch/params/corpus/10million/faiss-cohere-768-dp.json) and [neural-search](https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/neural_search/params/semanticfield/neural_search_semantic_field_dense_model.json) workloads. These tests help quantify the latency trade-offs introduced by MMR while highlighting the benefits of more diverse search results.
+
+### Cluster configuration
+
+The following OpenSearch cluster configuration was used:
+
+* Version: OpenSearch 3.3
+* Data nodes: 3 × r6g.2xlarge
+* Master nodes: 3 × c6g.xlarge
+* Benchmark instance: c6g.large
+
+### Vector Search Performance
+We used the cohere-1m dataset, which contains one million precomputed embeddings, to evaluate k-nearest neighbor (KNN) queries. The table below summarizes query latency (in milliseconds) for different values of k and MMR candidate sizes:
+
+| k   | Query size | MMR candidates | KNN p50 (no MMR) | KNN p90 (no MMR) | KNN p50 (with MMR) | KNN p90 (with MMR) | p50 Δ (%) | p90 Δ (%) | p50 Δ (ms) | p90 Δ (ms) |
+| --- | ---------- | -------------- | ---------------- | ---------------- | ------------------ | ------------------ | --------- | --------- | ---------- | ---------- |
+| 1   | 1          | 1              | 6.70             | 7.19             | 8.22               | 8.79               | 22.7      | 22.2      | 1.52       | 1.60       |
+| 10  | 10         | 10             | 8.09             | 8.64             | 9.14               | 9.62               | 13.0      | 11.3      | 1.05       | 0.98       |
+| 30  | 10         | 30             | 7.85             | 8.40             | 10.83              | 11.48              | 37.9      | 36.7      | 2.98       | 3.08       |
+| 50  | 10         | 50             | 7.17             | 7.63             | 11.76              | 12.55              | 64.1      | 64.5      | 4.59       | 4.92       |
+| 50  | 20         | 50             | 8.04             | 8.57             | 14.08              | 14.94              | 75.0      | 74.4      | 6.04       | 6.37       |
+| 50  | 50         | 50             | 8.34             | 8.91             | 17.25              | 17.94              | 106.8     | 101.3     | 8.91       | 9.03       |
+| 100 | 10         | 100            | 7.92             | 8.46             | 15.81              | 16.73              | 99.7      | 97.7      | 7.89       | 8.27       |
+
+
+### Neural Search Performance
+
+For neural search, we used the Quora dataset, containing over 500,000 documents. The table below shows query latency with and without MMR reranking:
+
+| k   | Query size | MMR candidates | Neural p50 (no MMR) | Neural p90 (no MMR) | Neural p50 (with MMR) | Neural p90 (with MMR) | p50 Δ (%) | p90 Δ (%) | p50 Δ (ms) | p90 Δ (ms) |
+| --- | ---------- | -------------- | ------------------- | ------------------- | --------------------- | --------------------- | --------- | --------- | ---------- | ---------- |
+| 1   | 1          | 1              | 113.59              | 122.22              | 113.08                | 122.38                | -0.46     | 0.13      | -0.52      | 0.16       |
+| 10  | 10         | 10             | 112.03              | 122.90              | 113.88                | 122.63                | 1.66      | -0.22     | 1.86       | -0.27      |
+| 30  | 10         | 30             | 112.09              | 118.82              | 119.57                | 127.65                | 6.67      | 7.42      | 7.48       | 8.82       |
+| 50  | 10         | 50             | 113.48              | 126.35              | 122.56                | 133.34                | 8.00      | 5.53      | 9.08       | 6.99       |
+| 50  | 20         | 50             | 113.80              | 125.20              | 122.94                | 134.80                | 8.04      | 7.67      | 9.14       | 9.60       |
+| 50  | 50         | 50             | 113.36              | 125.54              | 128.33                | 136.52                | 13.21     | 8.74      | 14.97      | 10.97      |
+| 100 | 10         | 100            | 119.04              | 128.71              | 130.52                | 139.95                | 9.65      | 8.73      | 11.48      | 11.24      |
+
+### Key Observations
+
+1. MMR adds latency, and the increase grows with the number of MMR candidates.
+2. KNN/Neural queries without MMR scale well with k. The dominant cost comes from graph traversal (ef_search), not selecting the top k candidates.
+
+Choosing the number of MMR candidates requires balancing diversity versus query latency. More candidates improve result diversity but increase latency, so select values appropriate for your workload.
+
+## Using MMR with Cross-cluster Search
+
+Currently, for [cross-cluster search](https://docs.opensearch.org/latest/search-plugins/cross-cluster-search/), OpenSearch cannot automatically resolve vector field information from the index mapping in the remote clusters. This means users must explicitly provide the vector field details when using MMR.
+
+Here’s an example query:
+
+```json
+POST /my-index/_search
+{
+  "query": {
+    "neural": {
+      "my_vector_field": {
+        "query_text": "query text",
+        "model_id": "<your model id>"
+      }
+    }
+  },
+  "ext": {
+    "mmr": {
+      "diversity": 0.5,
+      "candidates": 10,
+      "vector_field_path": "my_vector_field",
+      "vector_field_data_type": "float",
+      "vector_field_space_type": "l2"
+    }
+  }
+}
+
+```
+
+Explanation of MMR Parameters for Remote Clusters
+
+**vector_field_path:** Path to the vector field to use for MMR re-ranking.
+
+**vector_field_data_type:** Data type of the vector (e.g., float).
+
+**vector_field_space_type:** Distance metric used for similarity calculations (e.g., l2).
+
+candidates and diversity: Same as in local MMR queries, controlling the number of candidates and the diversity weight.
+
+Providing this information ensures that MMR can correctly compute diversity and re-rank results even when querying across remote clusters.
+
+## Summary
+
+OpenSearch’s Maximal Marginal Relevance (MMR) feature makes it easy to deliver search results that are both relevant and diverse. By intelligently re-ranking results, MMR helps surface a wider variety of options, reduces redundancy, and creates a richer, more engaging search experience for your users.
+
+If you’re looking to improve your vector search diversity, MMR in OpenSearch is a powerful tool to try today.
+
+## What's Next
+
+In the future, we can make MMR even easier and more flexible:
+
+- **Better support for remote clusters:** removing the need to manually specify vector field info.
+- **Expanded query type support:** Currently we only can support knn query or neural query with knn_vector. Potentially we can support more query types. e.g. bool and hybrid queries, so MMR can enhance a wider variety of search scenarios.