Added total latency logging for each attempt #1273

sammykao · 2025-06-27T18:19:03Z

Total Latency Logging for each attempt

This PR adds latency measurement for each model generation attempt in Garak. The measured latency (in seconds) is now recorded in the notes field of each Attempt entry in the report .jsonl output.

** Where**

Modified garak/probes/base.py
Timing logic added around the generator's generate method inside _execute_attempt

Latency is stored as:

"notes": {
  "latency": <float_seconds>
}

How

Latency is measured using wall-clock time from just before the model is called to just after the response is received.
Recorded as a float in seconds.

Why

Enables performance benchmarking and response time analysis.
Helps identify:
- Slow probes
- Generator inefficiencies
- Infrastructure bottlenecks

Example Output

{
  "entry_type": "attempt",
  ...
  "notes": {
    "latency": 0.12345
  },
  ...
}

Additional Notes

Measures total latency per attempt (including multi-turn sequences).
Fully backward-compatible: no changes made to the report format or analysis scripts.

sammykao · 2025-06-27T18:19:23Z

I have read the DCO Document and I hereby sign the DCO

jmartin-tech

This is an interesting enhancement, is there a specific need this is fulfilling? While timing of each inference is interesting I am not sure this will offer the desired values as most generators implement backoff functions to account for errors and rate limits.

As implemented a generator that needs to retry or backoff would store a latency value that may not accurately represent the time spent on a single inference, and in the case of generators that produce multiple generations with repeated calls implemented inside generate() this would not offer how that time was allocated.

Tests also need to account for the timing in a controlled way, this is often done by mocking the calls for time.time() in tests where it could have impact.

jmartin-tech · 2025-06-27T21:51:10Z

garak/probes/base.py

        )
+        end_time = time.time()
+        latency = end_time - start_time
+        this_attempt.notes["total_time"] = latency


Serialization of time difference should likely be formatted to ensure a clear unit of measure.

Added total latency logging for each attempt

17fe900

jmartin-tech requested changes Jun 27, 2025

View reviewed changes

leondz marked this pull request as draft August 7, 2025 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added total latency logging for each attempt #1273

Added total latency logging for each attempt #1273

Uh oh!

sammykao commented Jun 27, 2025

Uh oh!

sammykao commented Jun 27, 2025

Uh oh!

jmartin-tech left a comment •

edited

Loading

Uh oh!

jmartin-tech Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Added total latency logging for each attempt #1273

Are you sure you want to change the base?

Added total latency logging for each attempt #1273

Uh oh!

Conversation

sammykao commented Jun 27, 2025

Total Latency Logging for each attempt

Additional Notes

Uh oh!

sammykao commented Jun 27, 2025

Uh oh!

jmartin-tech left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jmartin-tech left a comment •

edited

Loading