Skip to content

Conversation

@sammykao
Copy link

Total Latency Logging for each attempt

This PR adds latency measurement for each model generation attempt in Garak. The measured latency (in seconds) is now recorded in the notes field of each Attempt entry in the report .jsonl output.


** Where**

  • Modified garak/probes/base.py
  • Timing logic added around the generator's generate method inside _execute_attempt
  • Latency is stored as:
    "notes": {
      "latency": <float_seconds>
    }

How

  • Latency is measured using wall-clock time from just before the model is called to just after the response is received.
  • Recorded as a float in seconds.

Why

  • Enables performance benchmarking and response time analysis.
  • Helps identify:
    • Slow probes
    • Generator inefficiencies
    • Infrastructure bottlenecks

Example Output

{
  "entry_type": "attempt",
  ...
  "notes": {
    "latency": 0.12345
  },
  ...
}

Additional Notes

  • Measures total latency per attempt (including multi-turn sequences).
  • Fully backward-compatible: no changes made to the report format or analysis scripts.

@sammykao
Copy link
Author

I have read the DCO Document and I hereby sign the DCO

Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting enhancement, is there a specific need this is fulfilling? While timing of each inference is interesting I am not sure this will offer the desired values as most generators implement backoff functions to account for errors and rate limits.

As implemented a generator that needs to retry or backoff would store a latency value that may not accurately represent the time spent on a single inference, and in the case of generators that produce multiple generations with repeated calls implemented inside generate() this would not offer how that time was allocated.

Tests also need to account for the timing in a controlled way, this is often done by mocking the calls for time.time() in tests where it could have impact.

)
end_time = time.time()
latency = end_time - start_time
this_attempt.notes["total_time"] = latency
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serialization of time difference should likely be formatted to ensure a clear unit of measure.

@leondz leondz marked this pull request as draft August 7, 2025 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants