Skip to content

Add support for recursive JSON Schema definitions ($ref / $defs) #374

@nikitaku11

Description

@nikitaku11

Problem

When building structured outputs that contain recursive data structures (e.g., expression trees, nested comments, org charts), there is no way to reference a schema node from within itself. Currently, the only workaround is to manually inline the schema N levels deep, which:

  1. Bloats the schema exponentially — each level duplicates the entire subtree, resulting in massive JSON payloads sent to the provider
  2. Imposes an artificial depth limit — you must choose a max depth at build time, and anything deeper is silently degraded to leaf-only nodes
  3. Hurts model accuracy — providers like OpenAI handle $ref natively in structured outputs and perform better with compact, canonical schemas than with deeply inlined duplicates
  4. Makes code harder to maintain — recursive structures require helper methods with depth counters instead of a straightforward declarative definition

Real-world use case

A rule/formula builder where expressions form a tree:

{
  "operation": "&&",
  "children": [
    {
      "operation": ">",
      "children": [
        { "operation": null, "formula_value": { "type": "value", "form_element_id": 12 } },
        { "operation": null, "formula_value": { "type": "number", "constant": 100 } }
      ]
    },
    {
      "operation": null,
      "formula_value": { "type": "option_selected", "form_option_id": 45 }
    }
  ]
}

The children array contains items of the same type as the parent — a textbook recursive schema.

Expected behavior

Something like:

$expression = $schema->object([
    'operation' => $schema->string()->nullable()->required(),
    'children' => $schema->array()
        ->nullable()
        ->items($schema->ref('formula_expression'))
        ->required(),
    'formula_value' => $formulaValue->nullable()->required(),
])->name('formula_expression'); // registers in $defs

Which would produce the standard JSON Schema output:

{
  "$defs": {
    "formula_expression": {
      "type": "object",
      "properties": {
        "operation": { "type": ["string", "null"] },
        "children": {
          "type": ["array", "null"],
          "items": { "$ref": "#/$defs/formula_expression" }
        },
        "formula_value": { "$ref": "#/$defs/formula_value" }
      },
      "required": ["operation", "children", "formula_value"]
    }
  }
}

API proposal

Suggested minimal API surface (open to discussion):

Method Purpose
$schema->ref(string $name) Emit a { "$ref": "#/$defs/{name}" } pointer
->name(string $name) Register the current node under $defs with the given name

Alternatively, a single combined method could work:

$schema->define('formula_expression', function (JsonSchema $schema) {
    return $schema->object([
        'children' => $schema->array()->items($schema->ref('formula_expression')),
    ]);
});

Provider compatibility

Provider $ref / $defs support
OpenAI (structured outputs) ✅ Fully supported and documented
Anthropic (tool use) ✅ Supported in tool input_schema
Google Gemini ⚠️ Limited — may need inlining as fallback

For providers that don't support $ref, the framework could automatically inline the schema up to a configurable depth as a fallback — which is exactly what users have to do manually today.

Current workaround

Recursive depth-limited inlining via helper methods:

private function buildFormulaExpression(
    JsonSchema $schema,
    mixed $formulaValue,
    array $enumValues,
    int $depth
): mixed {
    if ($depth <= 0) {
        // leaf-only node, no children
    }

    $child = $this->buildFormulaExpression(
        $schema, $formulaValue, $enumValues, $depth - 1
    );

    return $schema->object([
        'children' => $schema->array()->items($child)->nullable(),
        // ...
    ]);
}

This works but produces schemas that are orders of magnitude larger than the $ref equivalent and limits nesting depth artificially.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions