You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Vector store size impact varies by model**: GPT-4.1 series shows minimal latency impact across vector store sizes, while GPT-5 series shows significant increases.
224
218
@@ -238,10 +232,6 @@ In addition to the above evaluations which use a 3 MB sized vector store, the ha
|| Extra Large (105 MB) | 0.636 | 0.528 | 0.528 | 0.528 |
257
243
258
244
**Key Insights:**
259
245
260
246
-**Best Performance**: gpt-5-mini consistently achieves the highest ROC AUC scores across all vector store sizes (0.909-0.939)
261
-
-**Best Latency**: gpt-4.1-nano shows the most consistent and lowest latency across all scales (4,171-4,809ms P50) but shows poor performance
247
+
-**Best Latency**: gpt-4.1-mini (default) provides the lowest median latencies while maintaining strong accuracy
262
248
-**Most Stable**: gpt-4.1-mini (default) maintains relatively stable performance across vector store sizes with good accuracy-latency balance
263
249
-**Scale Sensitivity**: gpt-5 shows the most variability in performance across vector store sizes, with performance dropping significantly at larger scales
264
250
-**Performance vs Scale**: Most models show decreasing performance as vector store size increases, with gpt-5-mini being the most resilient
@@ -268,4 +254,4 @@ In addition to the above evaluations which use a 3 MB sized vector store, the ha
268
254
-**Signal-to-noise ratio degradation**: Larger vector stores contain more irrelevant documents that may not be relevant to the specific factual claims being validated
269
255
-**Semantic search limitations**: File search retrieves semantically similar documents, but with a large diverse knowledge source, these may not always be factually relevant
270
256
-**Document quality matters more than quantity**: The relevance and accuracy of documents is more important than the total number of documents
271
-
-**Performance plateaus**: Beyond a certain size (11 MB), the performance impact becomes less severe
257
+
-**Performance plateaus**: Beyond a certain size (11 MB), the performance impact becomes less severe
0 commit comments