[feat] AI-generated code quality score — show confidence and diff clarity before applying changes

## Problem

When Onlook's AI generates or suggests changes to React components, there's no signal about how confident the model is, or how "clean" the diff is — whether it's a minimal targeted change or a large rewrite that touches many unrelated lines.

As a user, I want to know:
- Is this change high-confidence (model is sure) or exploratory?
- Is the diff surgical (changes only what I asked) or noisy (rewrites unrelated code)?
- Are there parts the model flagged as uncertain or that need a human review?

## Proposed Solution

Add a lightweight **AI change quality indicator** to the diff preview before the user applies a change:

```
┌─────────────────────────────────────────────────────┐
│  AI Change Preview                                  │
│  ────────────────────────────────────────────────── │
│  Confidence: ████████░░  82%   Scope: Surgical ✓   │
│  Changed: 4 lines   Unchanged: 96%                  │
│  ⚠️  1 line flagged for review (className conflict)  │
└─────────────────────────────────────────────────────┘
```

**Metrics to surface:**
- **Confidence score**: model's self-reported certainty (if using a model API that supports it) or a proxy (diff size vs. requested change scope)
- **Scope ratio**: lines changed / total lines in component — lower is more surgical
- **Flagged lines**: lines where the model inserted a comment like `// TODO: verify` or where a conflict was detected
- **Change type**: `style-only`, `logic`, `refactor`, or `mixed` — helps the user know what to review

## Why This Matters

Right now, applying an AI suggestion in Onlook is a trust leap. Users can't tell if a change is a 2-line style tweak or a structural refactor that might break behavior. A lightweight confidence/clarity signal before applying would:
- Reduce accidental breaking changes
- Build user trust in AI-generated edits progressively
- Signal to users when to inspect before applying vs. auto-accept

This is especially important for less experienced users who may not be able to read the diff itself.

## Implementation Notes

- Most of the metadata is already computable locally from the diff (no extra API calls needed for scope ratio, changed lines, flagged comments).
- Confidence score could be approximated by diff size relative to the user prompt length.
- This could ship as a small info bar above the diff, off by default with an opt-in setting.

Happy to contribute a PR for the UI component + the diff analysis logic if the team is interested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] AI-generated code quality score — show confidence and diff clarity before applying changes #3114

Problem

Proposed Solution

Why This Matters

Implementation Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[feat] AI-generated code quality score — show confidence and diff clarity before applying changes #3114

Description

Problem

Proposed Solution

Why This Matters

Implementation Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions