feat: Add llm-eval-harness kit by Tharun2511 · Pull Request #179 · Lamatic/AgentKit

Tharun2511 · 2026-06-22T16:58:19Z

What this adds

An LLM Eval Harness kit — it scores an LLM system prompt against a golden set using an LLM-as-judge, then applies a CI-style pass/fail gate. Submitted for the agentkit-challenge.

Problem

Tweaking a prompt or swapping a model can silently regress output quality — hallucinations, over-promising, drift — and you usually find out in production. This makes that quality measurable and repeatable.

How it works

run-target flow runs the system-prompt-under-test on each golden-set case input.
judge flow (LLM-as-judge) scores the output on faithfulness, relevancy, and correctness (0–5). Faithfulness is a veto, so a fluent-but-hallucinated answer can't pass.
The Next.js app loops the golden set (bounded concurrency), aggregates the pass rate, and renders a GATE PASSED / FAILED verdict against a threshold (default 90%). Each case is expandable with the judge's reasoning.

Stack

Lamatic flows (Groq llama-3.3-70b-versatile) · Next.js 16 / React 19 · Tailwind v4 · shadcn/ui. Verified locally end-to-end; npm run build type-checks clean.

Notes / tradeoffs

Single-provider (Groq) in v1; multi-provider bring-your-own-key intentionally deferred — Lamatic stores model credentials at the project level, so runtime credential injection is a security tradeoff better done deliberately.
Judge output is parsed defensively (recovers scores even from slightly-malformed JSON); the gate is recomputed app-side from the dimension scores for determinism.

Label: agentkit-challenge

Files Added

Kit / Lamatic (15 files)

kits/llm-eval-harness/.gitignore — ignore local Lamatic/Node artifacts and env files
kits/llm-eval-harness/README.md — kit overview, judge scoring + gate logic, usage, and assumptions (single-provider v1 via Groq)
kits/llm-eval-harness/agent.md — detailed “quality-gate” agent docs for run-target + judge, JSON I/O expectations, and failure modes
kits/llm-eval-harness/constitutions/default.md — judge constitution (strict JSON-only scoring, faithfulness veto, determinism, no answer rewriting)
kits/llm-eval-harness/lamatic.config.ts — kit config wiring required steps judge/run-target to env flow IDs
kits/llm-eval-harness/flows/judge.ts — LLM-as-judge flow: trigger → judge LLM → response mapping to { answer }
kits/llm-eval-harness/flows/run-target.ts — system-under-test flow: trigger → target LLM → response mapping to { answer }
kits/llm-eval-harness/model-configs/judge.ts — judge model configuration wiring
kits/llm-eval-harness/model-configs/run-target.ts — run-target model configuration wiring
kits/llm-eval-harness/prompts/judge_system.md — judge rubric + strict JSON output contract (faithfulness, relevancy, correctness, overall, pass, reasoning)
kits/llm-eval-harness/prompts/judge_user.md — judge prompt template (INPUT/OUTPUT/CRITERIA/REFERENCE sections)
kits/llm-eval-harness/prompts/run-target_system.md — run-target system prompt template (injects systemPrompt)
kits/llm-eval-harness/prompts/run-target_user.md — run-target user prompt template (injects golden input)
kits/llm-eval-harness/apps/.env.example — env placeholders for JUDGE_FLOW, RUN_TARGET_FLOW, and Lamatic Studio credentials
kits/llm-eval-harness/flows/** and prompt/model-configs together define the evaluation pipeline used by the app

Next.js app (selected files listed from provided changes)

kits/llm-eval-harness/apps/.gitignore — ignore Next.js/Node build artifacts and local env files
kits/llm-eval-harness/apps/README.md — app setup + env vars + internal module/component map
kits/llm-eval-harness/apps/actions/orchestrate.ts — server action to run run-target then judge for each golden case, aggregate results, and apply threshold gating
kits/llm-eval-harness/apps/app/globals.css — Tailwind theme tokens + base styles (Geist-themed)
kits/llm-eval-harness/apps/app/layout.tsx — root layout (metadata, fonts, analytics)
kits/llm-eval-harness/apps/app/page.tsx — evaluation UI: validate system prompt + golden set JSON, run evaluation, and render gate/table
kits/llm-eval-harness/apps/components.json — shadcn/ui + Tailwind/RSC configuration
kits/llm-eval-harness/apps/components/gate-banner.tsx — “GATE PASSED/FAILED” banner with pass-rate ring + stats
kits/llm-eval-harness/apps/components/results-table.tsx — expandable per-case results table (scores, reasoning, output)
kits/llm-eval-harness/apps/components/ui/button.tsx — shadcn Button (ref-forwarding for React 19)
kits/llm-eval-harness/apps/components/ui/input.tsx — shadcn Input (ref-forwarding)
kits/llm-eval-harness/apps/components/ui/label.tsx — shadcn Label wrapper
kits/llm-eval-harness/apps/components/ui/textarea.tsx — shadcn Textarea (ref forwarding)
kits/llm-eval-harness/apps/lib/types.ts — GoldenCase, JudgeResult, CaseResult, RunAggregate
kits/llm-eval-harness/apps/lib/eval.ts — judge output normalization/parsing, concurrency helper, aggregation, and sample golden set
kits/llm-eval-harness/apps/lib/lamatic-client.ts — cached Lamatic client initialization from env vars
kits/llm-eval-harness/apps/lib/utils.ts — cn() class-name merge helper
kits/llm-eval-harness/apps/next.config.mjs — Next/Turbopack config
kits/llm-eval-harness/apps/package.json — app dependencies/scripts
kits/llm-eval-harness/apps/postcss.config.mjs — Tailwind PostCSS config
kits/llm-eval-harness/apps/tsconfig.json — strict TS + app-router config

Note: no flow.json files were found in kits/llm-eval-harness in the scanned repo; flow topology is derived from kits/llm-eval-harness/flows/*.ts.

Flow overview (node types + high level behavior)

`run-target` flow (`kits/llm-eval-harness/flows/run-target.ts`)

Node topology: API Request (trigger) → LLMNode (“Run Target”) → API Response (response mapping)
What it does:
- Accepts { systemPrompt, input } from the app.
- Uses the run-target prompt templates to produce a candidate answer/output from the provided golden-case input under the provided systemPrompt.
- Returns the generated text as answer (mapped for the caller).

`judge` flow (`kits/llm-eval-harness/flows/judge.ts`)

Node topology: API Request (trigger) → LLMNode (“Judge”) → API Response (response mapping)
What it does:
- Accepts { input, output, criteria, reference? }.
- Runs the LLM-as-judge prompt with the strict judge rubric and required JSON output contract.
- Returns a structured verdict in answer containing the three dimension scores plus overall, pass, and brief reasoning; faithfulness is treated as a veto by the gate logic.

LLM-as-judge eval harness: scores a system prompt against a golden set across faithfulness/relevancy/correctness and gates on pass rate. - judge + run-target Lamatic flow prompts - Next.js app: orchestrate loop (run-target -> judge), gate banner, expandable per-case results table - eval utils: HTML-entity decode, defensive JSON parsing, app-side gate recompute, bounded concurrency - add tsconfig.json + postcss.config.mjs (absent in scaffold) WIP: lamatic.config / README / agent.md metadata and exported flow files still to be finalised.

@references

- add judge + run-target flow definitions with their prompts and model-config stubs (@references resolve) - rewrite lamatic.config.ts, README, agent.md, and constitution for the eval-harness kit (problem/approach/results/tradeoffs) - remove content-generation leftovers; rename app package to llm-eval-harness

- redesign as a dark eval dashboard (pass-rate ring, score chips, glassy cards) - real-time golden-set JSON validation with inline errors; Run gated on validity - auto-open the first failed case; stay collapsed when all cases pass - recover scores from malformed judge JSON (e.g. unescaped quotes in reasoning) - cap textarea growth and scroll internally

- drop next.config typescript.ignoreBuildErrors (the app type-checks clean) - type golden-set case items instead of relying on implicit any

coderabbitai · 2026-06-22T16:58:30Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9025ae83-d168-4223-80e7-19f3cffb1cf5

📥 Commits

Reviewing files that changed from the base of the PR and between b9a6355 and 4177814.

📒 Files selected for processing (4)

kits/llm-eval-harness/apps/app/page.tsx
kits/llm-eval-harness/apps/components/ui/button.tsx
kits/llm-eval-harness/apps/components/ui/input.tsx
kits/llm-eval-harness/apps/components/ui/textarea.tsx

Walkthrough

This PR adds the kits/llm-eval-harness kit, Lamatic flow definitions and prompts, shared evaluation utilities and types, and a Next.js app that runs golden-set evaluations and displays a pass/fail gate with case details.

Changes

LLM Eval Harness Kit and Next.js App

Layer / File(s)	Summary
Kit definition, flows, prompts, and model configs `kits/llm-eval-harness/.gitignore`, `kits/llm-eval-harness/README.md`, `kits/llm-eval-harness/agent.md`, `kits/llm-eval-harness/lamatic.config.ts`, `kits/llm-eval-harness/constitutions/default.md`, `kits/llm-eval-harness/flows/judge.ts`, `kits/llm-eval-harness/flows/run-target.ts`, `kits/llm-eval-harness/model-configs/`, `kits/llm-eval-harness/prompts/`	Defines the Lamatic kit config with `judge` and `run-target` step bindings, both flow node/edge graphs, model config stubs, the default judge constitution, and prompt templates wired to `triggerNode_1.output.*` placeholders.
App project scaffold and theme setup `kits/llm-eval-harness/apps/package.json`, `kits/llm-eval-harness/apps/tsconfig.json`, `kits/llm-eval-harness/apps/next.config.mjs`, `kits/llm-eval-harness/apps/postcss.config.mjs`, `kits/llm-eval-harness/apps/.env.example`, `kits/llm-eval-harness/apps/.gitignore`, `kits/llm-eval-harness/apps/README.md`, `kits/llm-eval-harness/apps/app/globals.css`, `kits/llm-eval-harness/apps/app/layout.tsx`, `kits/llm-eval-harness/apps/lib/utils.ts`	Bootstraps the Next.js app with Turbopack root config, Tailwind v4 PostCSS pipeline, dark-mode CSS design tokens, root layout with Geist fonts and Vercel Analytics, and a `cn` class-merge utility.
Evaluation contracts, utilities, and orchestration `kits/llm-eval-harness/apps/lib/types.ts`, `kits/llm-eval-harness/apps/lib/eval.ts`, `kits/llm-eval-harness/apps/lib/lamatic-client.ts`, `kits/llm-eval-harness/apps/actions/orchestrate.ts`	Defines `GoldenCase`, `JudgeResult`, `CaseResult`, and `RunAggregate` interfaces; exports `parseJudgeResult`, `computeAggregate`, and `mapWithConcurrency`; adds a memoized `getLamaticClient`; and implements `runEvaluation` server action with input validation, bounded concurrency of 3, and a `{ success, data/error }` envelope.
Shared UI primitives and result components `kits/llm-eval-harness/apps/components/ui/*`, `kits/llm-eval-harness/apps/components/gate-banner.tsx`, `kits/llm-eval-harness/apps/components/results-table.tsx`	Adds Shadcn-style `Button`, `Input`, `Label`, and `Textarea` primitives; `GateBanner` with an SVG pass-rate ring, verdict text, and three stat tiles; and `ResultsTable` with per-case score chips, auto-expand of the first failing case, and expandable output/reasoning/criteria detail rows.
Evaluation page and input validation `kits/llm-eval-harness/apps/app/page.tsx`	Adds the client-side `EvalHarnessPage` with Zod-driven form validation, state for prompt/threshold/loading/results, `onSubmit` calling `runEvaluation`, and conditional rendering of `GateBanner` and `ResultsTable` from the returned `RunAggregate`.

Suggested reviewers

amanintech
d-pamneja

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is detailed, but it misses the repository's required PR checklist sections and validation/template fields.	Add the full PR Checklist from the template, including contribution type, general requirements, file structure, and validation items.
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise and accurately summarizes the main change: adding the llm-eval-harness kit.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-06-22T16:58:34Z

:robot_face: AgentKit Structural Validation

New Contributions Detected

Kit: kits/llm-eval-harness

Check Results

Check	Status
No edits to existing kits	✅ Pass
Required root files present	✅ Pass
Flow .ts files present	✅ Pass
lamatic.config.ts valid	✅ Pass
No changes outside kits/	✅ Pass

⚠️ Warnings

kits/llm-eval-harness is missing .env.example — bundles and kits should include one

🎉 All checks passed! This contribution follows the AgentKit structure.

coderabbitai

Actionable comments posted: 26

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

kits/llm-eval-harness/apps/components/ui/use-toast.ts (1)

1-192: ⚠️ Potential issue | 🟡 Minor

Remove duplicate dead code: components/ui/use-toast.ts is unused.

This file duplicates kits/llm-eval-harness/apps/hooks/use-toast.ts identically and is never imported anywhere in the codebase. The canonical implementation lives in hooks. Delete this file to eliminate code duplication and future confusion about which path is authoritative.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kits/llm-eval-harness/apps/components/ui/use-toast.ts` around lines 1 - 192,
Delete the entire file `kits/llm-eval-harness/apps/components/ui/use-toast.ts`
as it is a duplicate of the canonical implementation in
`kits/llm-eval-harness/apps/hooks/use-toast.ts`. This file containing the
exports of useToast, toast, and the reducer function is not imported anywhere in
the codebase and creates confusion about which version is authoritative. Use the
hooks directory version as the single source of truth.

kits/llm-eval-harness/apps/components/ui/use-mobile.tsx (1)

1-20: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Mission note: eliminate duplicate hook logic to prevent drift.

useIsMobile is duplicated here and in kits/llm-eval-harness/apps/hooks/use-mobile.ts (Lines 1-20). Keep one canonical implementation and re-export it from this UI path.

Proposed consolidation

-import * as React from 'react'
-
-const MOBILE_BREAKPOINT = 768
-
-export function useIsMobile() {
-  const [isMobile, setIsMobile] = React.useState<boolean | undefined>(undefined)
-
-  React.useEffect(() => {
-    const mql = window.matchMedia(`(max-width: ${MOBILE_BREAKPOINT - 1}px)`)
-    const onChange = () => {
-      setIsMobile(window.innerWidth < MOBILE_BREAKPOINT)
-    }
-    mql.addEventListener('change', onChange)
-    setIsMobile(window.innerWidth < MOBILE_BREAKPOINT)
-    return () => mql.removeEventListener('change', onChange)
-  }, [])
-
-  return !!isMobile
-}
+export { useIsMobile } from '`@/hooks/use-mobile`'

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kits/llm-eval-harness/apps/components/ui/use-mobile.tsx` around lines 1 - 20,
The useIsMobile function is duplicated across two files which risks
implementation drift. Remove the useIsMobile implementation from the UI
components file (use-mobile.tsx) and instead import and re-export it from the
canonical location in kits/llm-eval-harness/apps/hooks/use-mobile.ts to maintain
a single source of truth for this hook logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@kits/llm-eval-harness/apps/actions/orchestrate.ts`:
- Around line 3-4: Import the lamatic.config file from the parent kit directory
(../../lamatic.config) at the top of the file alongside the existing imports
from `@/lib/lamatic-client` and `@/lib/eval`. Replace all references to environment
variables for resolving judge and runTarget flow IDs (including the code block
at lines 23-30) with reads from the imported lamatic.config step definitions
instead, ensuring the flow IDs are sourced from the kit's declared configuration
rather than environment variables.
- Around line 47-60: The runEvaluation function accepts the threshold parameter
without validation, which allows NaN or out-of-range values to be passed to
computeAggregate and cause invalid gate verdicts. Add threshold validation in
the try block alongside the existing systemPrompt and cases validation, ensuring
threshold is a finite number and within a valid numeric range (such as 0 to 100
for percentage-based comparisons). Use the same error-throwing pattern as the
systemPrompt.trim() and Array.isArray(cases) checks to provide clear validation
messages.

In `@kits/llm-eval-harness/apps/app/layout.tsx`:
- Around line 6-7: The Geist and Geist_Mono font variables are being created but
not utilized in the layout. Remove the underscore prefixes from the variable
names `_geist` and `_geistMono` to indicate they are being used, then apply
these font variables to the root layout element by incorporating their variable
names into the body className or style configuration so the loaded Google Fonts
are actually applied instead of remaining unused.
- Around line 1-44: The CLAUDE.md guidelines document is out of sync with the
current codebase. Update the mission parameters in CLAUDE.md to authorize
Next.js 16.0.0 and React 19.2.0 instead of the currently listed Next.js 14-15
and React 18, since these newer versions are now the standard practice across
the kit ecosystem and are actively being used in the current deployment as
evidenced by the layout.tsx file and broader kit ecosystem confirmation.

In `@kits/llm-eval-harness/apps/app/page.tsx`:
- Line 153: The onChange handler for threshold input at line 153 allows NaN
values to corrupt the state when users type intermediate invalid characters like
"-", "e", or during empty input edits. Modify the onChange callback to check if
Number(e.target.value) produces NaN, and if it does, either skip the state
update or default to a safe value like 0 instead of allowing the NaN to
propagate through Math.max and Math.min into the setThreshold state.
- Around line 30-36: The validation loop checking parsed items is too permissive
and allows blank strings and unvalidated optional fields to pass through before
casting to GoldenCase[]. Enhance the validation condition to not only check that
input and criteria are strings but also verify they are non-empty strings.
Additionally add type validation for optional fields like id and reference if
they exist in the item object, ensuring they match the expected types defined in
GoldenCase. Apply these stricter checks within the loop condition before the
final cast to GoldenCase[] occurs at the return statement.
- Around line 20-70: Replace the manual useState-based form state management in
the EvalHarnessPage component with react-hook-form and zod. Create a Zod schema
that validates systemPrompt (non-empty string), goldenSet (valid JSON array with
objects containing input and criteria string fields), and threshold (number)
fields, then use the useForm hook with zodResolver to initialize the form.
Remove or integrate the validateGoldenSet function into the schema validation,
update the handleRun method to use the form submission handler pattern, and
modify loadExample to use form.setValue instead of individual setters. This
ensures the form follows project standards and maintains validation consistency
with server expectations.

In `@kits/llm-eval-harness/apps/components/results-table.tsx`:
- Around line 51-55: Add accessibility attributes to expose the row expansion
state to assistive technology. On the toggle button element, add the
aria-expanded attribute set to the isOpen boolean value to indicate whether the
row is currently expanded, and add an aria-controls attribute with a unique
identifier string. Then locate the corresponding details panel for this row
(referenced in lines 86-94) and add an id attribute to it that matches the value
used in the button's aria-controls attribute. This enables screen readers to
track the open and closed states of expandable rows.
- Line 50: The Fragment key at line 50 uses `result.case.id ?? index` which
causes collisions when IDs are null or repeat across rows, leading to state
reuse and incorrect expansion behavior. Replace the key with a collision-proof
value that combines both the ID and index to ensure uniqueness, such as
concatenating result.case.id with the index parameter, so that even when IDs
repeat or are undefined, each row maintains its own distinct key and state.

In `@kits/llm-eval-harness/apps/components/theme-provider.tsx`:
- Around line 1-11: The kit app in kits/llm-eval-harness/apps/ declares Next.js
16.0.0 and React 19.2.0, which violate the repository guideline requiring
Next.js 14–15 and React 18. Update the package.json dependencies to downgrade
next to a compliant version within the 14–15 range and react to version 18,
while keeping all other dependencies (Tailwind CSS v4+, react-hook-form, zod,
lucide-react, lamatic) aligned as they currently are.

In `@kits/llm-eval-harness/apps/components/ui/breadcrumb.tsx`:
- Around line 52-62: The BreadcrumbPage function incorrectly assigns role="link"
to a non-interactive span element, which creates an accessibility breach because
spans are not keyboard-navigable by default and assistive technology expects
link roles to be focusable and activatable. Remove the role="link" attribute
from the span since the aria-current="page" attribute already provides
sufficient semantic information to screen readers to indicate the current page,
and a non-interactive span is the correct choice here.

In `@kits/llm-eval-harness/apps/components/ui/button.tsx`:
- Around line 39-58: The Button function component does not handle the ref prop,
which breaks when CalendarDayButton attempts to pass a ref for focus management.
To fix this for React 19, add ref as a destructured parameter in the Button
function signature alongside asChild, and then pass it directly to the Comp
component using the ref attribute. This allows ref to be treated as a standard
prop without needing forwardRef.

In `@kits/llm-eval-harness/apps/components/ui/carousel.tsx`:
- Around line 96-105: In the React.useEffect hook, the event listener registered
with api.on('reInit', onSelect) is missing cleanup in the return function. Add a
corresponding api.off('reInit', onSelect) call in the cleanup function alongside
the existing api.off('select', onSelect) call to ensure both event listeners are
properly removed when the component unmounts or dependencies change. This
prevents stale handlers from being retained and avoids duplicate state updates
across lifecycle transitions.

In `@kits/llm-eval-harness/apps/components/ui/chart.tsx`:
- Around line 235-239: The conditional check `item.value &&` in the tooltip
render logic treats 0 as falsy, causing zero-valued series to be hidden from the
tooltip display. Replace the truthy check with an explicit null/undefined check
instead, such as checking if `item.value != null` or `typeof item.value !==
'undefined'`, to ensure that legitimate zero values are rendered in the span
element alongside the toLocaleString() formatting.

In `@kits/llm-eval-harness/apps/components/ui/form.tsx`:
- Around line 45-66: The useFormField hook accesses fieldContext.name before
validating that fieldContext exists, and the validation check itself is flawed
because the context default value is an empty object which is always truthy.
Move the validation check for fieldContext to the beginning of the hook, before
any context properties are accessed, and update the validation logic to check if
fieldContext.name is defined rather than just checking if fieldContext exists,
since an empty object will not trigger the current falsy check. This ensures the
hook properly detects when it is called outside the FormField component wrapper.

In `@kits/llm-eval-harness/apps/components/ui/kbd.tsx`:
- Around line 18-26: The KbdGroup component is currently rendering a `<kbd>`
element, which creates semantically incorrect nested kbd tags when it contains
child Kbd components. Replace the `<kbd>` element in the KbdGroup function with
a `<div>` or `<span>` element instead, while keeping all other attributes
(data-slot, className with the cn() utility, and props spreading) exactly the
same. This preserves the visual grouping behavior while maintaining proper HTML
semantics.

In `@kits/llm-eval-harness/apps/components/ui/menubar.tsx`:
- Around line 227-240: The MenubarSubTrigger component is using the Tailwind
utility class outline-none in its className string, but other interactive
elements in this file (referenced at lines 59, 106, 124, and 148) use
outline-hidden instead. Replace outline-none with outline-hidden in the
className prop of the MenubarPrimitive.SubTrigger to maintain consistency across
the component and align with Tailwind v4 best practices for complete outline
removal on custom focus-styled components.

In `@kits/llm-eval-harness/apps/components/ui/pagination.tsx`:
- Around line 102-117: The PaginationEllipsis function has a contradiction where
the parent span element is marked with aria-hidden but contains a child span
with sr-only class that provides "More pages" text for screen readers. When a
parent is aria-hidden, assistive technologies cannot read its children. Remove
the aria-hidden attribute from the outer span element in PaginationEllipsis so
that the screen reader text inside the sr-only span will be properly announced
to users relying on assistive technology.

In `@kits/llm-eval-harness/apps/components/ui/table.tsx`:
- Around line 42-52: In the TableFooter function, the arbitrary Tailwind variant
in the className prop contains an invalid CSS pseudo-class selector. Replace the
incorrect `:last` pseudo-class with `:last-child` in the string
`[&>tr]:last:border-b-0` to ensure the border-bottom utility is properly applied
only to the last table row. The selector should read
`[&>tr]:last-child:border-b-0` to correctly target the final child element and
remove its bottom border.

In `@kits/llm-eval-harness/apps/components/ui/toggle.tsx`:
- Around line 9-29: The toggleVariants constant definition contains outline-none
in its base variant string, but Tailwind v4 changed this utility to only remove
the outline-style property rather than the entire outline. Locate the
outline-none utility in the base variant string of the toggleVariants constant
and replace it with outline-hidden to ensure complete outline removal and
maintain proper visual presentation in Tailwind v4.

In `@kits/llm-eval-harness/apps/components/ui/tooltip.tsx`:
- Around line 21-29: The Tooltip component wraps its content in a
TooltipProvider, but since sidebar.tsx already has a TooltipProvider at the
SidebarProvider level, this creates nested providers which is inefficient.
Export the base TooltipPrimitive.Root as a separate named export (such as
TooltipRoot) in the tooltip.tsx file alongside the existing Tooltip component,
then update sidebar.tsx to use this TooltipRoot component directly instead of
the Tooltip component to avoid the unnecessary provider nesting while still
maintaining tooltip functionality.

In `@kits/llm-eval-harness/apps/hooks/use-toast.ts`:
- Around line 174-183: The useEffect hook in the useToast function has an
incorrect dependency array that includes state, causing unnecessary
re-subscriptions on every state change. Since the effect only uses the stable
setState function from React.useState and never actually references the state
value, change the dependency array from [state] to an empty array []. This
prevents the listener from being unsubscribed and re-subscribed on state
changes, eliminating the race window where dispatched updates could be missed.

In `@kits/llm-eval-harness/apps/lib/eval.ts`:
- Around line 123-141: The mapWithConcurrency function lacks validation for the
limit parameter, which can accept 0, negative, or NaN values. When limit is
invalid, Math.min(limit, items.length) will create zero workers, causing the
function to return an array with uninitialized slots without throwing an error.
Add an explicit check at the start of the mapWithConcurrency function to
validate that limit is a positive integer, and throw an appropriate error if the
validation fails before proceeding with worker creation.

In `@kits/llm-eval-harness/apps/next.config.mjs`:
- Around line 3-5: The images configuration object in next.config.mjs has
unoptimized set to true, which disables Next.js's automatic image optimization.
Verify the actual deployment target for this kit - if it will be deployed to
Vercel or a similar platform that supports automatic image optimization,
consider removing the unoptimized: true setting to enable performance
optimization. If the kit is intended for static export or a specific platform
that requires this setting, add a comment documenting the reason for disabling
image optimization.

In `@kits/llm-eval-harness/apps/package.json`:
- Line 53: The lamatic dependency in package.json is pinned to "latest" which
creates non-deterministic builds and risks pulling breaking changes. Replace the
"latest" value for the lamatic package with a specific version number or a
version range using semantic versioning (for example, using caret notation like
"^1.2.3" to allow compatible minor and patch updates while locking the major
version). Check the current resolved version in
node_modules/lamatic/package.json to determine the appropriate version to pin
to.

In `@kits/llm-eval-harness/prompts/judge_system.md`:
- Around line 50-59: The JSON code block in judge_system.md starting at line 50
is missing a language fence specification, which violates markdownlint rule
MD040. Add `json` immediately after the opening triple backticks (change ``` to
```json) to specify the code block language and resolve the linting error.

---

Outside diff comments:
In `@kits/llm-eval-harness/apps/components/ui/use-mobile.tsx`:
- Around line 1-20: The useIsMobile function is duplicated across two files
which risks implementation drift. Remove the useIsMobile implementation from the
UI components file (use-mobile.tsx) and instead import and re-export it from the
canonical location in kits/llm-eval-harness/apps/hooks/use-mobile.ts to maintain
a single source of truth for this hook logic.

In `@kits/llm-eval-harness/apps/components/ui/use-toast.ts`:
- Around line 1-192: Delete the entire file
`kits/llm-eval-harness/apps/components/ui/use-toast.ts` as it is a duplicate of
the canonical implementation in `kits/llm-eval-harness/apps/hooks/use-toast.ts`.
This file containing the exports of useToast, toast, and the reducer function is
not imported anywhere in the codebase and creates confusion about which version
is authoritative. Use the hooks directory version as the single source of truth.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 23a86a12-90c8-44ec-83a2-f10ae8469787

📥 Commits

Reviewing files that changed from the base of the PR and between f531be9 and 0648275.

⛔ Files ignored due to path filters (1)

kits/llm-eval-harness/apps/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (91)

kits/llm-eval-harness/.gitignore
kits/llm-eval-harness/README.md
kits/llm-eval-harness/agent.md
kits/llm-eval-harness/apps/.env.example
kits/llm-eval-harness/apps/.gitignore
kits/llm-eval-harness/apps/README.md
kits/llm-eval-harness/apps/actions/orchestrate.ts
kits/llm-eval-harness/apps/app/globals.css
kits/llm-eval-harness/apps/app/layout.tsx
kits/llm-eval-harness/apps/app/page.tsx
kits/llm-eval-harness/apps/components.json
kits/llm-eval-harness/apps/components/gate-banner.tsx
kits/llm-eval-harness/apps/components/results-table.tsx
kits/llm-eval-harness/apps/components/theme-provider.tsx
kits/llm-eval-harness/apps/components/ui/accordion.tsx
kits/llm-eval-harness/apps/components/ui/alert-dialog.tsx
kits/llm-eval-harness/apps/components/ui/alert.tsx
kits/llm-eval-harness/apps/components/ui/aspect-ratio.tsx
kits/llm-eval-harness/apps/components/ui/avatar.tsx
kits/llm-eval-harness/apps/components/ui/badge.tsx
kits/llm-eval-harness/apps/components/ui/breadcrumb.tsx
kits/llm-eval-harness/apps/components/ui/button-group.tsx
kits/llm-eval-harness/apps/components/ui/button.tsx
kits/llm-eval-harness/apps/components/ui/calendar.tsx
kits/llm-eval-harness/apps/components/ui/card.tsx
kits/llm-eval-harness/apps/components/ui/carousel.tsx
kits/llm-eval-harness/apps/components/ui/chart.tsx
kits/llm-eval-harness/apps/components/ui/checkbox.tsx
kits/llm-eval-harness/apps/components/ui/collapsible.tsx
kits/llm-eval-harness/apps/components/ui/command.tsx
kits/llm-eval-harness/apps/components/ui/context-menu.tsx
kits/llm-eval-harness/apps/components/ui/dialog.tsx
kits/llm-eval-harness/apps/components/ui/drawer.tsx
kits/llm-eval-harness/apps/components/ui/dropdown-menu.tsx
kits/llm-eval-harness/apps/components/ui/empty.tsx
kits/llm-eval-harness/apps/components/ui/field.tsx
kits/llm-eval-harness/apps/components/ui/form.tsx
kits/llm-eval-harness/apps/components/ui/hover-card.tsx
kits/llm-eval-harness/apps/components/ui/input-group.tsx
kits/llm-eval-harness/apps/components/ui/input-otp.tsx
kits/llm-eval-harness/apps/components/ui/input.tsx
kits/llm-eval-harness/apps/components/ui/item.tsx
kits/llm-eval-harness/apps/components/ui/kbd.tsx
kits/llm-eval-harness/apps/components/ui/label.tsx
kits/llm-eval-harness/apps/components/ui/menubar.tsx
kits/llm-eval-harness/apps/components/ui/navigation-menu.tsx
kits/llm-eval-harness/apps/components/ui/pagination.tsx
kits/llm-eval-harness/apps/components/ui/popover.tsx
kits/llm-eval-harness/apps/components/ui/progress.tsx
kits/llm-eval-harness/apps/components/ui/radio-group.tsx
kits/llm-eval-harness/apps/components/ui/resizable.tsx
kits/llm-eval-harness/apps/components/ui/scroll-area.tsx
kits/llm-eval-harness/apps/components/ui/select.tsx
kits/llm-eval-harness/apps/components/ui/separator.tsx
kits/llm-eval-harness/apps/components/ui/sheet.tsx
kits/llm-eval-harness/apps/components/ui/sidebar.tsx
kits/llm-eval-harness/apps/components/ui/skeleton.tsx
kits/llm-eval-harness/apps/components/ui/slider.tsx
kits/llm-eval-harness/apps/components/ui/sonner.tsx
kits/llm-eval-harness/apps/components/ui/spinner.tsx
kits/llm-eval-harness/apps/components/ui/switch.tsx
kits/llm-eval-harness/apps/components/ui/table.tsx
kits/llm-eval-harness/apps/components/ui/tabs.tsx
kits/llm-eval-harness/apps/components/ui/textarea.tsx
kits/llm-eval-harness/apps/components/ui/toast.tsx
kits/llm-eval-harness/apps/components/ui/toaster.tsx
kits/llm-eval-harness/apps/components/ui/toggle-group.tsx
kits/llm-eval-harness/apps/components/ui/toggle.tsx
kits/llm-eval-harness/apps/components/ui/tooltip.tsx
kits/llm-eval-harness/apps/components/ui/use-mobile.tsx
kits/llm-eval-harness/apps/components/ui/use-toast.ts
kits/llm-eval-harness/apps/hooks/use-mobile.ts
kits/llm-eval-harness/apps/hooks/use-toast.ts
kits/llm-eval-harness/apps/lib/eval.ts
kits/llm-eval-harness/apps/lib/lamatic-client.ts
kits/llm-eval-harness/apps/lib/types.ts
kits/llm-eval-harness/apps/lib/utils.ts
kits/llm-eval-harness/apps/next.config.mjs
kits/llm-eval-harness/apps/package.json
kits/llm-eval-harness/apps/postcss.config.mjs
kits/llm-eval-harness/apps/tsconfig.json
kits/llm-eval-harness/constitutions/default.md
kits/llm-eval-harness/flows/judge.ts
kits/llm-eval-harness/flows/run-target.ts
kits/llm-eval-harness/lamatic.config.ts
kits/llm-eval-harness/model-configs/judge.ts
kits/llm-eval-harness/model-configs/run-target.ts
kits/llm-eval-harness/prompts/judge_system.md
kits/llm-eval-harness/prompts/judge_user.md
kits/llm-eval-harness/prompts/run-target_system.md
kits/llm-eval-harness/prompts/run-target_user.md

coderabbitai · 2026-06-22T17:10:54Z

+  React.useEffect(() => {
+    listeners.push(setState)
+    return () => {
+      const index = listeners.indexOf(setState)
+      if (index > -1) {
+        listeners.splice(index, 1)
+      }
+    }
+  }, [state])
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Find use-toast effects that re-subscribe on state:" rg -n --type=ts --type=tsx '\}, \[state\]\)' kits/llm-eval-harness/apps/hooks/use-toast.ts kits/llm-eval-harness/apps/components/ui/use-toast.ts -C3

Repository: Lamatic/AgentKit

Length of output: 157

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check if file exists if [ -f "kits/llm-eval-harness/apps/hooks/use-toast.ts" ]; then echo "=== File found, reading lines 174-200 ===" sed -n '174,200p' kits/llm-eval-harness/apps/hooks/use-toast.ts | cat -n echo "" echo "=== Full file context (first 50 lines) ===" head -50 kits/llm-eval-harness/apps/hooks/use-toast.ts | cat -n else echo "File not found at expected path. Searching for use-toast files:" find . -name "*use-toast*" -type f 2>/dev/null || echo "No files found" fi

Repository: Lamatic/AgentKit

Length of output: 1996

🏁 Script executed:

#!/bin/bash set -euo pipefail # Search for the exact pattern with simpler approach echo "=== Searching for useEffect with state dependency ===" rg -n "useEffect" kits/llm-eval-harness/apps/hooks/ -A 5 -B 1 echo "" echo "=== Listing all files in hooks directory ===" find kits/llm-eval-harness/apps/hooks -type f 2>/dev/null || echo "Directory not found"

Repository: Lamatic/AgentKit

Length of output: 1376

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Full use-toast hook around the useEffect ===" sed -n '150,192p' kits/llm-eval-harness/apps/hooks/use-toast.ts | cat -n

Repository: Lamatic/AgentKit

Length of output: 1155

Mission critical: eliminate listener subscription churn that creates notification gaps.

Your useToast hook re-subscribes the listener on every state change (line 182). Since setState from React.useState is stable and the effect never actually uses the state value—only setState—the dependency array should be empty. Each unnecessary re-run unsubscribes and re-subscribes the same listener, creating a race window where dispatched updates could be missed before re-subscription completes.

Suggested patch

React.useEffect(() => { listeners.push(setState) return () => { const index = listeners.indexOf(setState) if (index > -1) { listeners.splice(index, 1) } } - }, [state]) + }, [])

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@kits/llm-eval-harness/apps/hooks/use-toast.ts` around lines 174 - 183, The useEffect hook in the useToast function has an incorrect dependency array that includes state, causing unnecessary re-subscriptions on every state change. Since the effect only uses the stable setState function from React.useState and never actually references the state value, change the dependency array from [state] to an empty array []. This prevents the listener from being unsubscribed and re-subscribed on state changes, eliminating the race window where dispatched updates could be missed.

…fig-driven flows) - remove ~50 unused shadcn boilerplate components + hooks + theme-provider (keep only button/input/label/textarea that are actually used) - resolve flow IDs from lamatic.config step definitions (widen turbopack root so the orchestrate action can import ../../lamatic.config) - validate threshold (server action + input) and the concurrency limit - stricter golden-set validation: non-empty input/criteria, typed optional fields - results rows: aria-expanded/aria-controls + collision-proof keys - Button forwards ref (React 19); wire Geist fonts; pin lamatic ^0.3.2; json-fence the judge prompt; drop unused images.unoptimized

Tharun2511 · 2026-06-23T04:33:00Z

Thanks for the detailed review — addressed in 22fac39.

Most of the UI-component findings are resolved by removing the unused shadcn boilerplate: this kit only ships the four components it actually uses (button, input, label, textarea), so the duplicated hooks, breadcrumb, carousel, chart, form, kbd, menubar, pagination, table, toggle, tooltip, and theme-provider are no longer present.

Also fixed:

Flow IDs now resolve from lamatic.config step definitions (the orchestrate action imports ../../lamatic.config).
threshold validated in the server action and clamped on input; mapWithConcurrency validates its limit.
Stricter golden-set validation (non-empty input/criteria, typed optional fields).
Result rows: aria-expanded/aria-controls + collision-proof keys.
Button forwards ref (React 19); Geist fonts wired; lamatic pinned to ^0.3.2; JSON code-fence in the judge prompt; dropped unused images.unoptimized.

Two I intentionally left, happy to change if you'd prefer otherwise:

react-hook-form + zod — the form is three fields with synchronous validation, so manual validation keeps the kit minimal and dependency-light.
Next 16 / React 19 — this kit was scaffolded from content-generation, which already uses these versions, so it matches the current kit ecosystem.

coderabbitai

♻️ Duplicate comments (1)

kits/llm-eval-harness/apps/app/page.tsx (1)
20-83: 📐 Maintainability & Code Quality | 🟠 Major | 🏗️ Heavy lift

Mission directive: replace manual form control with react-hook-form + zod before launch.

Lines 20-83 still rely on hand-rolled validation/state (validateGoldenSet + multiple useStates). That leaves this form outside the required app form contract and increases drift risk versus server-side rules. As per coding guidelines, kits/*/apps/**/*.{ts,tsx,js,jsx} must use react-hook-form + zod for forms.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kits/llm-eval-harness/apps/app/page.tsx` around lines 20 - 83, Replace the
manual form validation and state management in the EvalHarnessPage component
with react-hook-form and zod. Remove the validateGoldenSet function and the
multiple useState declarations for systemPrompt, goldenSet, threshold, etc.,
then create a zod schema that encapsulates the validation logic currently in
validateGoldenSet (checking for non-empty strings, valid JSON array format,
required fields like input and criteria, and optional fields like reference and
id). Use useForm hook from react-hook-form with the zod schema via zodResolver
to manage form state and validation, replacing the manual validation logic in
handleRun and the useMemo call with the form's built-in validation methods.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@kits/llm-eval-harness/apps/app/page.tsx`:
- Around line 20-83: Replace the manual form validation and state management in
the EvalHarnessPage component with react-hook-form and zod. Remove the
validateGoldenSet function and the multiple useState declarations for
systemPrompt, goldenSet, threshold, etc., then create a zod schema that
encapsulates the validation logic currently in validateGoldenSet (checking for
non-empty strings, valid JSON array format, required fields like input and
criteria, and optional fields like reference and id). Use useForm hook from
react-hook-form with the zod schema via zodResolver to manage form state and
validation, replacing the manual validation logic in handleRun and the useMemo
call with the form's built-in validation methods.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: ea913545-55d3-41a7-bd2a-39e73018b38b

📥 Commits

Reviewing files that changed from the base of the PR and between 0648275 and 22fac39.

⛔ Files ignored due to path filters (1)

kits/llm-eval-harness/apps/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (11)

kits/llm-eval-harness/apps/actions/orchestrate.ts
kits/llm-eval-harness/apps/app/globals.css
kits/llm-eval-harness/apps/app/layout.tsx
kits/llm-eval-harness/apps/app/page.tsx
kits/llm-eval-harness/apps/components/results-table.tsx
kits/llm-eval-harness/apps/components/ui/button.tsx
kits/llm-eval-harness/apps/lib/eval.ts
kits/llm-eval-harness/apps/lib/lamatic-client.ts
kits/llm-eval-harness/apps/next.config.mjs
kits/llm-eval-harness/apps/package.json
kits/llm-eval-harness/prompts/judge_system.md

💤 Files with no reviewable changes (1)

kits/llm-eval-harness/apps/lib/lamatic-client.ts

- replace manual useState/validateGoldenSet with useForm + zodResolver - zod schema validates systemPrompt, golden-set JSON (array of cases), threshold - Input/Textarea forward ref so RHF register attaches correctly - preserves the live validation UX (status pill, inline errors, gated Run button)

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@kits/llm-eval-harness/apps/app/page.tsx`:
- Around line 21-23: The `input` and `criteria` validation in the schema uses
`z.string().min(1)` which accepts whitespace-only strings like `"   "`, causing
semantic empty cases to pass through and waste evaluation runs. Update both the
`input` field and `criteria` field validation to ensure they contain actual
non-whitespace content by either using `.trim().min(1)` to remove and validate
whitespace, or adding a `.refine()` check that explicitly validates the string
contains at least one non-whitespace character after trimming.

In `@kits/llm-eval-harness/apps/components/ui/input.tsx`:
- Around line 5-8: The Input component in the Input function accepts ref as a
destructured prop, which is only valid in React 19 but violates the React 18
guideline requirement. Resolve this by first confirming the target React version
with your team. If React 18 is enforced, wrap the Input component with
React.forwardRef, remove ref from the function parameter destructuring, accept
it as the second parameter in the forwardRef callback, and pass it to the input
element. Additionally, ensure package.json React version matches the confirmed
target. If React 19 adoption is approved, update the project guidelines to
reflect the actual React version being used.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5282ba74-bc1a-4daf-90f1-b331622220fd

📥 Commits

Reviewing files that changed from the base of the PR and between 22fac39 and b9a6355.

📒 Files selected for processing (3)

kits/llm-eval-harness/apps/app/page.tsx
kits/llm-eval-harness/apps/components/ui/input.tsx
kits/llm-eval-harness/apps/components/ui/textarea.tsx

- convert Button/Input/Textarea to React.forwardRef (compatible with React 18 and 19; lets RHF register attach a ref without version-specific patterns) - zod schema: trim input/criteria so whitespace-only cases are rejected

akshatvirmani · 2026-06-23T08:01:02Z

/validate

github-actions · 2026-06-23T08:01:13Z

📡 Running Studio validation — results will appear here shortly.

github-actions · 2026-06-23T08:01:21Z

Studio Runtime Validation (Phase 2)

✅ Studio validation passed. The kit loaded successfully in Lamatic Studio.

This PR is ready for final review and merge.

Tharun2511 · 2026-06-24T08:10:40Z

@coderabbitai review

coderabbitai · 2026-06-24T08:10:47Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Tharun2511 added 4 commits June 22, 2026 20:16

chore: enforce type-checking and harden golden-set validation

0648275

- drop next.config typescript.ignoreBuildErrors (the app type-checks clean) - type golden-set case items instead of relying on implicit any

github-actions Bot added agentkit-challenge requested-improvements labels Jun 22, 2026

coderabbitai Bot requested changes Jun 22, 2026

View reviewed changes

coderabbitai Bot requested review from amanintech and d-pamneja June 23, 2026 04:31

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

coderabbitai Bot requested changes Jun 23, 2026

View reviewed changes

Comment thread kits/llm-eval-harness/apps/app/page.tsx Outdated

Comment thread kits/llm-eval-harness/apps/components/ui/input.tsx Outdated

Tharun2511 closed this Jun 23, 2026

Tharun2511 reopened this Jun 23, 2026

coderabbitai Bot approved these changes Jun 24, 2026

View reviewed changes

Conversation

Tharun2511 commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this adds

Problem

How it works

Stack

Notes / tradeoffs

Files Added

Kit / Lamatic (15 files)

Next.js app (selected files listed from provided changes)

Flow overview (node types + high level behavior)

run-target flow (kits/llm-eval-harness/flows/run-target.ts)

judge flow (kits/llm-eval-harness/flows/judge.ts)

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

:robot_face: AgentKit Structural Validation

New Contributions Detected

Check Results

⚠️ Warnings

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Tharun2511 commented Jun 23, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

akshatvirmani commented Jun 23, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026

Studio Runtime Validation (Phase 2)

Uh oh!

Tharun2511 commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tharun2511 commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

`run-target` flow (`kits/llm-eval-harness/flows/run-target.ts`)

`judge` flow (`kits/llm-eval-harness/flows/judge.ts`)

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

github-actions Bot commented Jun 22, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading