Skip to content

Recursive JSON Schemas #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sahewat opened this issue Oct 31, 2023 · 8 comments
Closed

Recursive JSON Schemas #330

sahewat opened this issue Oct 31, 2023 · 8 comments
Labels

Comments

@sahewat
Copy link

sahewat commented Oct 31, 2023

Recursive Pydantic definitions seem unsupported for lists, unions, and optionals. My understanding is these are the basic use cases.

A reproducible example is provided below:

import json

from typing import List, Optional, Union
from pydantic import BaseModel

from outlines.text.json_schema import build_regex_from_schema

class TaskOptional(BaseModel):
    subtask: Optional['TaskOptional']

class TaskWrapperOptional(BaseModel):
    task: TaskOptional

class TaskUnion(BaseModel):
    subtask: Union['TaskUnion', None]

class TaskWrapperUnion(BaseModel):
    task: TaskUnion

class TaskList(BaseModel):
    subtask: List['TaskList']

class TaskWrapperList(BaseModel):
    task: TaskList

TaskWrapperOptional.model_rebuild()
TaskWrapperUnion.model_rebuild()
TaskWrapperList.model_rebuild()

regex = build_regex_from_schema(json.dumps(TaskWrapperOptional.model_json_schema()))
# NotImplementedError: 
# @ outlines/text/json_schema.py:308 

regex = build_regex_from_schema(json.dumps(TaskWrapperUnion.model_json_schema()))
# NotImplementedError: 
# @ outlines/text/json_schema.py:308 

regex = build_regex_from_schema(json.dumps(TaskWrapperList.model_json_schema()))
# RecursionError: maximum recursion depth exceeded while calling a Python object
# @ outlines/text/json_schema.py:174 

I'd be interested in adding this functionality but I'm unsure as to what an "unrolled" recursive definition would look like in terms of the generated regex.

@rlouf
Copy link
Member

rlouf commented Nov 1, 2023

Thank you for opening an issue! It looks like something we should be able to support. Do you mind pasting the JSON schema for these 3 models here?

@brandonwillard brandonwillard added bug structured generation Linked to structured generation JSON labels Nov 1, 2023
@sahewat
Copy link
Author

sahewat commented Nov 1, 2023

In all cases, the schema is available through TaskWrapperxxx.model_json_schema(). I'll post an example of each here as well.

TaskWrapperOptional

{
    "$defs": {
        "Task": {
            "properties": {
                "name": {
                    "title": "Name",
                    "type": "string"
                },
                "subtasks": {
                    "default": [],
                    "items": {
                        "$ref": "#/$defs/Task"
                    },
                    "title": "Subtasks",
                    "type": "array"
                }
            },
            "required": [
                "name"
            ],
            "title": "Task",
            "type": "object"
        },
        "TaskOptional": {
            "properties": {
                "subtask": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/Task"
                        },
                        {
                            "type": "null"
                        }
                    ]
                }
            },
            "required": [
                "subtask"
            ],
            "title": "TaskOptional",
            "type": "object"
        }
    },
    "properties": {
        "task": {
            "$ref": "#/$defs/TaskOptional"
        }
    },
    "required": [
        "task"
    ],
    "title": "TaskWrapperOptional",
    "type": "object"
}

TaskWrapperUnion

{
    "$defs": {
        "TaskUnion": {
            "properties": {
                "subtask": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/TaskUnion"
                        },
                        {
                            "type": "null"
                        }
                    ]
                }
            },
            "required": [
                "subtask"
            ],
            "title": "TaskUnion",
            "type": "object"
        }
    },
    "properties": {
        "task": {
            "$ref": "#/$defs/TaskUnion"
        }
    },
    "required": [
        "task"
    ],
    "title": "TaskWrapperUnion",
    "type": "object"
}

TaskWrapperList

{
    "$defs": {
        "TaskList": {
            "properties": {
                "subtask": {
                    "items": {
                        "$ref": "#/$defs/TaskList"
                    },
                    "title": "Subtask",
                    "type": "array"
                }
            },
            "required": [
                "subtask"
            ],
            "title": "TaskList",
            "type": "object"
        }
    },
    "properties": {
        "task": {
            "$ref": "#/$defs/TaskList"
        }
    },
    "required": [
        "task"
    ],
    "title": "TaskWrapperList",
    "type": "object"
}

@brandonwillard brandonwillard linked a pull request May 9, 2024 that will close this issue
@brandonwillard brandonwillard changed the title Recursive Pydantic objects are not supported for basic patterns Recursive JSON Schemas May 9, 2024
@brandonwillard
Copy link
Member

brandonwillard commented May 9, 2024

Anyone who is interested in this feature should know that CFG-structured generation is required to truly support it.

@lapp0 lapp0 mentioned this issue Jul 25, 2024
6 tasks
@hugocool
Copy link

hugocool commented Oct 8, 2024

I saw there is a pull request that implement a beta version of CFG guided generation, which is amazing. But the request failed, is that everything thats necessary to get this functionality available?
The PR failed due to a regression in a performance benchmark, i believe in a measurement that didnt really exist before, so is there anything that i can do to test/help get this recursive field functionaility over the line?
I really need this recursive functionality, and am considering switching to a different structured generation library (https://github.com/noamgat/lm-format-enforcer), however that one seems like it is much less mature and i do intend to use this for production usecases.

Anyway, if there is anything i can do to help please let me know.

@lapp0
Copy link
Contributor

lapp0 commented Oct 9, 2024

@hugocool To have stable CFG-based JSON generation we need

There may be other paths forward, but this is the approach immediately obvious to me.

This isn't an area of focus of mine at the moment, but if you're interested in tackling either issue, please let me know what questions you have!

@hugocool
Copy link

hugocool commented Oct 9, 2024

Okay, i am willing to pick this up.
So i have some questions, what is the current state of the Lark grammer generation? Which elements of the PR (lapp0#85) can i build upon? or any attempt for that matter?
Are you aware of lm-format-enforcer and their approach to solving this problem? What elements of their learnings should we incorporate?

I think i would start with generating a Lark grammer for my specific usecase, which is a specific recursive JSON model. Then if that works we can see how to generalize it so it can work for any arbitrary JSON schema.
I am assuming i should build of of these examples:

Are there any more resources i should be aware off?

Lastly, i am assuming that the second issue you mentioned, would come into play once we would like to generalize the solution to JSON more broadly, right?

@rlouf rlouf closed this as completed Feb 24, 2025
@hugocool
Copy link

Is this working now? Did you guys fix it? If so, is there a branch or release I can pull to test it out? And does it work using CFG or a different technique?

@rlouf
Copy link
Member

rlouf commented Feb 24, 2025

It should work (with fixed depth). Implementation is in outlines-core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants