Skip to content

[feat] AI-generated code quality score — show confidence and diff clarity before applying changes #3114

@Ruthwik-Data

Description

@Ruthwik-Data

Problem

When Onlook's AI generates or suggests changes to React components, there's no signal about how confident the model is, or how "clean" the diff is — whether it's a minimal targeted change or a large rewrite that touches many unrelated lines.

As a user, I want to know:

  • Is this change high-confidence (model is sure) or exploratory?
  • Is the diff surgical (changes only what I asked) or noisy (rewrites unrelated code)?
  • Are there parts the model flagged as uncertain or that need a human review?

Proposed Solution

Add a lightweight AI change quality indicator to the diff preview before the user applies a change:

┌─────────────────────────────────────────────────────┐
│  AI Change Preview                                  │
│  ────────────────────────────────────────────────── │
│  Confidence: ████████░░  82%   Scope: Surgical ✓   │
│  Changed: 4 lines   Unchanged: 96%                  │
│  ⚠️  1 line flagged for review (className conflict)  │
└─────────────────────────────────────────────────────┘

Metrics to surface:

  • Confidence score: model's self-reported certainty (if using a model API that supports it) or a proxy (diff size vs. requested change scope)
  • Scope ratio: lines changed / total lines in component — lower is more surgical
  • Flagged lines: lines where the model inserted a comment like // TODO: verify or where a conflict was detected
  • Change type: style-only, logic, refactor, or mixed — helps the user know what to review

Why This Matters

Right now, applying an AI suggestion in Onlook is a trust leap. Users can't tell if a change is a 2-line style tweak or a structural refactor that might break behavior. A lightweight confidence/clarity signal before applying would:

  • Reduce accidental breaking changes
  • Build user trust in AI-generated edits progressively
  • Signal to users when to inspect before applying vs. auto-accept

This is especially important for less experienced users who may not be able to read the diff itself.

Implementation Notes

  • Most of the metadata is already computable locally from the diff (no extra API calls needed for scope ratio, changed lines, flagged comments).
  • Confidence score could be approximated by diff size relative to the user prompt length.
  • This could ship as a small info bar above the diff, off by default with an opt-in setting.

Happy to contribute a PR for the UI component + the diff analysis logic if the team is interested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions