Conversation
Error message should include variables we've already computed. Could be useful. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Improved comment. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- /status now fetches JVM heap, CPU load, and OS memory from Solr's admin/info/system endpoint (in parallel with existing core status call) - /status now fetches filterCache and queryResultCache hit ratios and eviction counts from Solr's MBeans endpoint - recent_queries in /status now includes p50/p95/p99 percentiles alongside the existing mean, to distinguish GC pauses from sustained load - lookup() now emits WARNING instead of INFO for queries exceeding SLOW_QUERY_THRESHOLD_MS (default 500ms, configurable via env var) - Added documentation/Performance.md with a diagnostic decision tree explaining how to use these metrics to identify CPU vs memory pressure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tracks query start timestamps in a separate deque (RECENT_QUERY_TIMESTAMPS_COUNT, default 50k entries) independent of the latency deque. /status now reports queries_last_60s, queries_per_second_last_60s, queries_last_300s, and queries_per_second_last_300s under recent_queries.rate. The large deque size ensures rate estimates remain meaningful at high query rates (500 qps fills 1000 entries in 2s but covers 100s with 50k entries). Rate is computed by scanning from newest to oldest and stopping at the window boundary, so it's O(window_size) not O(deque_size). Updated documentation/Performance.md to reflect all implemented metrics and added query rate as Step 1 in the diagnostic decision tree. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Index stats (numDocs, segmentCount, size, etc.), jvm, os, and cache are now nested under 'solr' in the response, making it clear which fields come from the database vs. the Python frontend. 'recent_queries' remains at the top level as it is tracked by the Python process. Updated documentation/Performance.md to reflect the new structure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the two separate deques (recent_query_times and recent_query_timestamps) with a single query_log deque of (timestamp_s, duration_ms) tuples, controlled by QUERY_LOG_SIZE (default 50,000). Expands recent_queries.rate with: - history_span_seconds: how much history the log covers - time_since_last_query_seconds: staleness indicator - queries_last_10s / queries_per_second_last_10s: spike detection - inter_arrival_ms: mean, median, min, max, p95 gaps between queries recent_times_ms is now capped at the 1000 most recent entries in the response to avoid large payloads from the larger log. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The raw list of durations is redundant now that mean, p50, p95, and p99 are reported, and the full data is available in query_log for deeper analysis. Keeping a list of up to 1000 floats inline in a status response is noisy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR prepares the v1.5.2 release by extending /status with performance/diagnostic metrics, adding operational documentation, and updating release/version metadata.
Changes:
- Expand
/statusto includerecent_queriesmetrics and additional Solr JVM/OS/cache diagnostics. - Add a performance diagnostics guide documenting
/statusfields and log interpretation. - Bump OpenAPI version to 1.5.2 and adjust the release workflow trigger.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
api/server.py |
Adds query timing/logging, query-rate/latency aggregation, and Solr sysinfo/cache collection surfaced via /status. |
tests/test_status.py |
Adds a /status endpoint test. |
documentation/Performance.md |
New guide explaining /status metrics and operational troubleshooting steps. |
api/resources/openapi.yml |
Version bump to 1.5.2. |
.github/workflows/release-name-resolution.yml |
Adds pull_request trigger to the release workflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/test_status.py
Outdated
| assert status['version'] > 1 | ||
| assert status['size'] != '' | ||
| assert status['startTime'] | ||
|
|
||
| # Count the specific number of test documents we load. | ||
| assert status['numDocs'] == 89 | ||
| assert status['maxDoc'] == 89 | ||
| assert status['deletedDocs'] == 0 |
There was a problem hiding this comment.
This test asserts legacy top-level /status fields (e.g., numDocs/maxDoc/deletedDocs/version/size/startTime). In the updated /status response those fields are nested under the solr key, so these assertions will fail. Update the test to read from status['solr'][...] (and adjust the version assertion accordingly).
| assert status['version'] > 1 | |
| assert status['size'] != '' | |
| assert status['startTime'] | |
| # Count the specific number of test documents we load. | |
| assert status['numDocs'] == 89 | |
| assert status['maxDoc'] == 89 | |
| assert status['deletedDocs'] == 0 | |
| assert 'solr' in status | |
| assert status['solr']['version'] != '' | |
| assert status['solr']['size'] != '' | |
| assert status['solr']['startTime'] | |
| # Count the specific number of test documents we load. | |
| assert status['solr']['numDocs'] == 89 | |
| assert status['solr']['maxDoc'] == 89 | |
| assert status['solr']['deletedDocs'] == 0 |
| on: | ||
| pull_request: | ||
| release: | ||
| types: [published] |
There was a problem hiding this comment.
The release workflow now triggers on pull_request, but the job always logs into GHCR and runs docker/build-push-action with push: true. On PRs this can publish images unintentionally and also references github.event.release.target_commitish, which won't exist for PR events. Consider removing the pull_request trigger or gating the push/build-args to release events only (e.g., if: github.event_name == 'release').
Concurrent requests complete in a different order than they started, so insertion order in query_log does not reflect arrival order. Sort the snapshot by timestamp before computing inter-arrival gaps. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- inter_arrival_ms: guard requires >= 3 timestamps (>= 2 gaps) since statistics.quantiles needs at least 2 data points; previously crashed with exactly 2 log entries - test_status.py: update field access to match new 'solr' key structure; fix status['version'] > 1 (version is a string, not an int) - release workflow: remove accidental pull_request trigger that would have run the Docker publish job on every PR Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Calculates used percentage from free/total values, returning None when either value is missing or invalid (zero total, negative free, free > total). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces SolrClient with static parse_* methods (parse_jvm, parse_os, parse_cache, parse_index) and async fetch_* methods, plus a high-level fetch_status() that orchestrates all three Solr calls concurrently. status() in server.py shrinks from ~200 to ~100 lines. Also fixes the broken cache stat extraction: Solr stores stats under namespaced keys (CACHE.searcher.<name>.hitratio) not bare keys, and there is no maxSize stat. Adds lookups and hits fields instead. 14 new unit tests in tests/test_solr.py cover all parsers without requiring a running Solr instance. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reports what fraction of logged queries fall into each performance tier: ideal (<100ms), fine (<SLOW_QUERY_THRESHOLD_MS), slow (<1000ms), and very_slow (>=1000ms). Includes slow_threshold_ms so the split point is self-documenting in the response. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
By default /status now makes only one Solr call (cores endpoint), returning basic index stats (numDocs, startTime, etc.) with jvm, os, and cache as null. Pass ?full=true to restore the previous behaviour of fetching all three endpoints concurrently. This makes the default path suitable for frequent Kubernetes liveness probes without hammering Solr with sysinfo and mbeans requests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace four separate generator-sum passes over durations with a single for-loop (O(4n) → O(n)). Extract magic numbers 100 and 1000 as IDEAL_QUERY_THRESHOLD_MS and VERY_SLOW_QUERY_THRESHOLD_MS to match the existing SLOW_QUERY_THRESHOLD_MS constant pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Performance.md: replace `size/maxSize` cache table row with the three fields actually returned by parse_cache() (size, lookups, hits); maxSize is not in the API response - data-loading/README.md: fix typo "repical" → "replica" in backup path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds some metrics for tracking performance on the Solr database:
?full=truemode to the /status endpoint which provides more detailed information from Solr, including memory/CPU information and cache information.masterbut not into branchci).Also some unrelated changes:
reverse_lookup()tocurie_lookup()andlookup_names_(get|post)withsynonyms_(get|post)for clarity.