Skip to content

feat(observer): add context usage metrics endpoint#1361

Open
mvanhorn wants to merge 1 commit intovolcengine:mainfrom
mvanhorn:feat/observer-usage-metrics
Open

feat(observer): add context usage metrics endpoint#1361
mvanhorn wants to merge 1 commit intovolcengine:mainfrom
mvanhorn:feat/observer-usage-metrics

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

@mvanhorn mvanhorn commented Apr 10, 2026

Description

Add a /api/v1/observer/usage endpoint that returns context usage metrics (vector count from VikingDB). The existing observer system shows component health (healthy/unhealthy) but provides no usage data. This endpoint answers "how much is stored?" alongside the existing "is it working?" checks.

Related Issue

No existing issue. This fills a gap in the observer system where health checks exist but usage metrics do not.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Add UsageObserver class in openviking/storage/observers/usage_observer.py extending BaseObserver
  • Register UsageObserver in observers/__init__.py
  • Add usage() method to ObserverService in debug_service.py
  • Add /api/v1/observer/usage endpoint in observer.py router

The implementation follows the same pattern as VikingDBObserver and QueueObserver. The UsageObserver queries vikingdb_manager.count() for total vector count and formats the result using tabulate.

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Demo

usage-endpoint-demo

Shows the /api/v1/observer/usage endpoint returning vector count, the usage component appearing in the system status, and a comparison with the existing vikingdb observer endpoint.

Additional Notes

The UsageObserver reuses the same vikingdb_manager.count() call that VikingDBObserver._get_collection_statuses() already makes. The difference is framing: VikingDB observer focuses on collection health (index count, error state), while the usage observer focuses on consumption metrics (total vectors). Future extensions could add per-user breakdowns using RequestContext scoping.

This contribution was developed with AI assistance (Claude Code).

Add UsageObserver and /api/v1/observer/usage endpoint that returns
vector count from VikingDB. Extends the existing observer pattern
(BaseObserver + router endpoint) to provide usage visibility alongside
the existing health checks.

The endpoint follows the same auth flow as other observer endpoints,
scoping results to the authenticated user's context.
@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🏅 Score: 82
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Async Discipline Violation

The synchronous get_usage/get_status_table methods use run_async, which blocks when called from the async /usage endpoint, starving the event loop.

def get_usage(self, ctx: Optional[RequestContext] = None) -> Dict[str, Any]:
    """Synchronous wrapper for get_usage_async."""
    return run_async(self.get_usage_async(ctx=ctx))

def get_status_table(self, ctx: Optional[RequestContext] = None) -> str:
    """Format usage metrics as a table."""
    from tabulate import tabulate

    usage = self.get_usage(ctx=ctx)

    data = [
        {"Metric": "Total Vectors", "Value": usage.get("total_vectors", 0)},
    ]

    return tabulate(data, headers="keys", tablefmt="pretty")
Missing Usage Component in System Status

The system() method's components dictionary does not include the new 'usage' observer, so usage metrics are omitted from the overall system status.

def system(self, ctx: Optional[RequestContext] = None) -> SystemStatus:
    """Get system overall status."""
    components = {
        "queue": self.queue,
        "vikingdb": self.vikingdb(ctx=ctx),
        "models": self.models,

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Make get_status_table async

Make get_status_table async and call get_usage_async directly to avoid the run_async
wrapper, which could block the event loop when called from an async context.

openviking/storage/observers/usage_observer.py [54-58]

-def get_status_table(self, ctx: Optional[RequestContext] = None) -> str:
+async def get_status_table(self, ctx: Optional[RequestContext] = None) -> str:
     """Format usage metrics as a table."""
     from tabulate import tabulate
 
-    usage = self.get_usage(ctx=ctx)
+    usage = await self.get_usage_async(ctx=ctx)
Suggestion importance[1-10]: 6

__

Why: Modifies get_status_table to be async and call get_usage_async directly, which helps avoid potential event loop blocking from the run_async wrapper. However, this change requires updates to callers to function correctly.

Low
Make usage method async

Make the usage method async to match the async get_status_table method, ensuring we
don't block the event loop when this is called from the async API endpoint.

openviking/service/debug_service.py [189-197]

-def usage(self, ctx: Optional[RequestContext] = None) -> ComponentStatus:
+async def usage(self, ctx: Optional[RequestContext] = None) -> ComponentStatus:
     """Get context usage metrics."""
     observer = UsageObserver(self._vikingdb)
     return ComponentStatus(
         name="usage",
         is_healthy=observer.is_healthy(),
         has_errors=observer.has_errors(),
-        status=observer.get_status_table(ctx=ctx),
+        status=await observer.get_status_table(ctx=ctx),
     )
Suggestion importance[1-10]: 6

__

Why: Makes the usage method async and awaits get_status_table, which is required if get_status_table is made async. This helps prevent blocking the event loop when called from the async endpoint.

Low
Possible issue
Await the async usage call

Await the async usage method now that it's been converted to async, to properly
execute the coroutine and avoid errors.

openviking/server/routers/observer.py [101]

-component = service.debug.observer.usage(ctx=ctx)
+component = await service.debug.observer.usage(ctx=ctx)
Suggestion importance[1-10]: 6

__

Why: Adds await to the usage call in the async endpoint, which is required if the usage method is made async. This ensures the coroutine is properly executed.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant