Skip to content

QA in-concept tasks: evaluate typo + US/UK spelling tools via OCL AI Assistant (or not) #2404

@filiperochalopes

Description

@filiperochalopes

User story

As a curator, I want to run QA in-concept checks using OCL AI Assistant to detect typos and US/UK spelling variants so concept names can be validated consistently.

Use case

Run in-concept QA on clinical names to (1) detect typos, (2) detect US vs UK spelling, and (3) Create new name copying US FSN and converting it to UK spelling for the new FSN term as suggestion.

Requirements

Acceptance criteria

  • Typos are detected and returned as structured JSON (original, corrected, changes, confidence, notes).
  • US/UK spelling variant is classified as US/UK/MIXED/UNCERTAIN with evidence in structured JSON.
  • US → UK conversion returns structured JSON (original, converted, changes, confidence, notes).
  • When UK spelling is suggested and no compatible name exists, the output includes a suggestion to create an en-GB FSN name.

More details

Prompts and test notes

Prompt 1 — typo recognition (use with medgemma:4b)

You are a strict English spelling reviewer for clinical terminology.

Task:
Detect whether the input contains spelling mistakes or obvious typographical errors.
Do not rewrite for style.
Do not simplify.
Do not expand abbreviations unless the typo makes the intended word unmistakable.
Preserve clinical meaning.

Return valid JSON only with this schema:
{
"original": string,
"has_typo": boolean,
"corrected": string,
"changes": [
{
"from": string,
"to": string,
"reason": "typo" | "variant_spelling" | "uncertain"
}
],
"confidence": number,
"notes": string
}

Rules:

  • If there is no typo, set "corrected" equal to the original text.
  • If uncertain, keep the original text and explain uncertainty in "notes".
  • Do not convert US spelling to UK spelling or vice versa unless explicitly asked.
  • Treat valid medical terminology conservatively.

Input:
{{TEXT}}

Example input:
Acute mycardial infarction

Expected output (summary):

  • has_typo: true
  • corrected: Acute myocardial infarction

Prompt 2 — detect US vs UK spelling

You are a strict linguistic classifier for English spelling variants.

Task:
Classify the input spelling as:

  • "US"
  • "UK"
  • "MIXED"
  • "UNCERTAIN"

Return valid JSON only:
{
"original": string,
"classification": "US" | "UK" | "MIXED" | "UNCERTAIN",
"evidence": [
{
"term": string,
"variant": "US" | "UK" | "AMBIGUOUS"
}
],
"normalized_us": string,
"normalized_uk": string,
"confidence": number,
"notes": string
}

Rules:

  • Do not rewrite beyond spelling normalization.
  • If the text has no diagnostic spelling clues, return "UNCERTAIN".
  • If both systems appear, return "MIXED".
  • Do not change medical meaning.
  • Be conservative with technical and clinical terms.

Input:
{{TEXT}}

Example inputs:

  • The patient has anemia and will undergo a pediatric evaluation.
  • The patient has anaemia and will undergo a paediatric evaluation.
  • The patient has anemia and was admitted to the theatre.

Prompt 3 — convert US → UK spelling (test with translategemma:4b and T5-based model)

You are a precise US-to-UK English spelling converter for clinical and technical text.

Task:
Convert only spelling and directly related orthographic conventions from US English to UK English.
Preserve meaning, punctuation, casing, formatting, and sentence structure as much as possible.

Return valid JSON only:
{
"original": string,
"converted": string,
"changes": [
{
"from": string,
"to": string,
"reason": "US_to_UK_spelling"
}
],
"confidence": number,
"notes": string
}

Rules:

  • Only change orthography when clearly appropriate.
  • Do not paraphrase.
  • Do not add or remove information.
  • Do not translate to another language.
  • Preserve valid drug names, codes, acronyms, and proper nouns.
  • If no US spelling is present, return the original unchanged.
  • In clinical text, prioritize terminology fidelity over style.

Input:
{{TEXT}}

Example input:
The patient was admitted to the pediatric hematology unit for evaluation of anemia.

Expected output (approx):
The patient was admitted to the paediatric haematology unit for evaluation of anaemia.

Prompt 4 — clinical-safe wrapper (mode-based)

You are a conservative orthographic normalizer for English clinical terminology.

Task:
Review the text and perform one of these actions only:

  1. detect typos,
  2. detect US vs UK spelling,
  3. convert US spelling to UK spelling.

Mode:
{{MODE}}

Allowed modes:

  • TYPO_CHECK
  • VARIANT_DETECTION
  • US_TO_UK

Critical rules:

  • Never alter ICD, SNOMED, LOINC, RxNorm, brand names, generic drug names, abbreviations, IDs, or proper nouns unless there is an obvious character-level typo.
  • Never paraphrase.
  • Never replace a clinical concept with a more common lay term.
  • If uncertain, keep the original.
  • Return JSON only.

Input:
{{TEXT}}

Deterministic tool to test

Local model notes (baseline comparison)

  • gemma3:4b-it-qat
  • gemma3:4b
  • gemma3:12b (if available)
  • qwen3:4b-instruct

Metadata

Metadata

Labels

No labels
No labels

Projects

Status

Requirements

Relationships

None yet

Development

No branches or pull requests

Issue actions