dottxt-ai
diff --git a/‎docs/features/advanced/backends.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/features/advanced/backends.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/features/advanced/logits_processors.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/features/advanced/logits_processors.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/features/core/generator.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/features/core/generator.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/features/core/inputs.md‎
Lines changed: 19 additions & 11 deletions b/‎docs/features/core/inputs.md‎
Lines changed: 19 additions & 11 deletions
diff --git a/‎docs/features/core/output_types.md‎
Lines changed: 5 additions & 5 deletions b/‎docs/features/core/output_types.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎docs/features/core/outputs.md‎
Lines changed: 66 additions & 0 deletions b/‎docs/features/core/outputs.md‎
Lines changed: 66 additions & 0 deletions
diff --git a/‎docs/features/core/tools.md‎
Lines changed: 164 additions & 0 deletions b/‎docs/features/core/tools.md‎
Lines changed: 164 additions & 0 deletions
@@ -25,11 +25,11 @@ model = outlines.from_transformers(
 )
 
 result = model("What is the capital of France?", output_type, backend="llguidance")
-print(result) # 'Paris'
+print(result.content) # 'Paris'
 
 generator = outlines.Generaor(model, output_type)
 result = generator("What is the capital of France?", backend="xgrammar")
-print(result) # 'Paris'
+print(result.content) # 'Paris'
 ```
 
 If you do not provide a value for the `backend` argument, the default value will be used. The default value depends on the type of output type:
 
@@ -44,7 +44,7 @@ logits_processor = RegexLogitsProcessor(r"U\+[0-9A-Fa-f]{4,6}", model.tokenizer,
 generator = Generator(model, processor=logits_processor)
 response = generator("What's the unicode for the hugging face emoji")
 
-print(response) # U+1F917
+print(response.content) # U+1F917
 ```
 
 ## Creating Custom Logits Processors
@@ -95,5 +95,5 @@ formatted_prompt = tf_tokenizer.apply_chat_template(
 generator = Generator(model, processor=logits_processor)
 response = generator(formatted_prompt)
 
-print(response) # "101111"
+print(response.content) # "101111"
 ```
@@ -47,7 +47,7 @@ generator = Generator(model)
 result = generator("Write a short poem about AI.")
 
 # Print the result
-print(result)
+print(result.content)
 ```
 
 ## Structured Generation
@@ -77,7 +77,7 @@ generator = Generator(model, BookRecommendation)
 result = generator("Recommend a science fiction book.")
 
 # Parse the JSON result into a Pydantic model
-book = BookRecommendation.model_validate_json(result)
+book = BookRecommendation.model_validate_json(result.content)
 print(f"{book.title} by {book.author} ({book.year})")
 ```
 
@@ -109,7 +109,7 @@ result = generator(
 
 ## Return Value
 
-The generator always returns a raw string containing the generated text. When generating structured outputs, you need to parse this string into the desired format.
+The generator returns an `Output` instance (or a iterator containing `StreamingOutput` instances in case of streaming). The `content` field contains the generated text as a string. When generating structured outputs, you need to parse this string into the desired format.
 
 Unlike in Outlines v0, where the return type could be a parsed object, in v1 you are responsible for parsing the output when needed:
 
@@ -126,7 +126,7 @@ generator = Generator(model, Person)
 result = generator("Generate a person:")
 
 # Parse the result yourself
-person = Person.model_validate_json(result)
+person = Person.model_validate_json(result.content)
 ```
 
 ::: outlines.generator.Generator
@@ -32,7 +32,7 @@ model = outlines.from_transformers(
 
 # Simple text prompt
 response = model("What's the capital of France?", max_new_tokens=20)
-print(response)  # 'Paris'
+print(response.content)  # 'Paris'
 ```
 
 ## Multimodal Inputs (Vision)
@@ -76,16 +76,22 @@ prompt = [
 
 # Call the model to generate a response
 response = model(prompt, max_tokens=50)
-print(response) # 'This is a picture of a black dog.'
+print(response.content) # 'This is a picture of a black dog.'
 ```
 
 ## Chat Inputs
 
 For conversational models, you can use the `Chat` class to provide a conversation history with multiple messages.
 
-A `Chat` instance is instantiated with an optional list of messages. Each message must be a dictionary containing two mandatory keys:
-- `role`: must be one of `system`, `assistant` or `user`
-- `content`: must be either a string or a multimodal input (if the model supports it)
+A `Chat` is instantiated with an optional list of messages. The type of each message is defined by the value of the mandatory `role` key. There are 4 types of messages that each have their associated keys:
+- `system`: system instructions to give context to the LLM on the task to perform. The only other key is `content` (mandatory).
+- `user`: a message from you in the conversation. The only other key is `content` (mandatory).
+- `assistant`: a response from the LLM. The other keys are `content` and `tool_calls` (a list of `ToolCall` instances). At least one of those two must be provided.
+- `tool`: a tool call response. The other keys are `content` (mandatory), `tool_name` and `tool_call_id`. Depending on the models you are using, one of those two is mandatory.
+
+Support for the various message types and fields described above depends on the capabilities of the model you are using. Tool calling is limited to a few models at the moment for instance. To know more about tools, consult the dedicated section on [tools](./tools.md).
+
+An `Output` instance returned by a model can also be added to a `Chat`. It will automatically be turned into a user message. To know more about model outputs, consult the dedicated section on [outputs](./outputs.md).
 
 For instance:
 
@@ -149,13 +155,15 @@ print(prompt)
 # {'role': 'assistant', 'content': 'Excellent, thanks!'}
 ```
 
-Finally, there are three convenience method to easily add a message:
+There are four convenience method to easily add a message:
 
-- add_system_message
-- add_user_message
-- add_assistant_message
+- `add_system_message`
+- `add_user_message`
+- `add_assistant_message`
+- `add_tool_message`
+- `add_output`
 
-As the role is already set, you only need to provide the content.
+As the role is already set, you only need to provide values for the other keys of the message type, except for the `add_output` for which you would just provide the model call output.
 
 For instance:
 
@@ -200,5 +208,5 @@ prompts = [
 
 # Call it to generate text
 result = model.batch(prompts, max_new_tokens=20)
-print(result) # ['Vilnius', 'Riga', 'Tallinn']
+print([item.content for item in result]) # ['Vilnius', 'Riga', 'Tallinn']
 ```
@@ -48,9 +48,9 @@ def create_character() -> Character:
 With an Outlines model, you can generate text that respects the type hints above by providing those as the output type:
 
 ```python
-model("How many minutes are there in one hour", int) # "60"
-model("Pizza or burger", Literal["pizza", "burger"]) # "pizza"
-model("Create a character", Character, max_new_tokens=100) # '{"name": "James", "birth_date": "1980-05-10)", "skills": ["archery", "negotiation"]}'
+model("How many minutes are there in one hour", int).content # "60"
+model("Pizza or burger", Literal["pizza", "burger"]).content # "pizza"
+model("Create a character", Character, max_new_tokens=100).content # '{"name": "James", "birth_date": "1980-05-10)", "skills": ["archery", "negotiation"]}'
 ```
 
 An important difference with function type hints though is that an Outlines generator always returns a string.
@@ -61,8 +61,8 @@ For instance:
 ```python
 result = model("Create a character", Character, max_new_tokens=100)
 casted_result = Character.model_validate_json(result)
-print(result) # '{"name": "Aurora", "birth_date": "1990-06-15", "skills": ["Stealth", "Diplomacy"]}'
-print(casted_result) # name=Aurora birth_date=datetime.date(1990, 6, 15) skills=['Stealth', 'Diplomacy']
+print(result).content # '{"name": "Aurora", "birth_date": "1990-06-15", "skills": ["Stealth", "Diplomacy"]}'
+print(casted_result).content # name=Aurora birth_date=datetime.date(1990, 6, 15) skills=['Stealth', 'Diplomacy']
 ```
 
 ## Output Type Categories
 
@@ -0,0 +1,66 @@
+---
+title: Outputs
+---
+
+# Outputs
+
+## Overview
+
+Outlines uses two objcets to contain model response: `Ouptut` and `StreamingOutput`.
+
+They both have two fields:
+
+- `content`: the raw text reponse returned by the model
+- `tool_calls`: a list of `ToolCallOutput` or `StreamingToolCallOutput` instances if the model decided to call a tool instead of giving a response directly. This field can only have a value if you provided a list of tools to the model in the first place.
+
+To access the text response from the model, you would thus typically only do `reponse.output`. In the case of streaming, it would give you a chunk of the response.
+
+## Chat
+
+If you are using a `Chat` input to call the model, you can add the `Output` you received from the model to your `Chat` instance to add a new message that will be part of the conversation provided to the model the next time you can it.
+
+For instance:
+
+```python
+import transformers
+import outlines
+from outlines.inputs import Chat, Image
+
+MODEL_ID = "microsoft/Phi-3-mini-4k-instruct"
+
+model = outlines.from_transformers(
+    transformers.AutoModelForCausalLM.from_pretrained(MODEL_ID),
+    transformers.AutoTokenizer.from_pretrained(MODEL_ID),
+)
+
+# Initialize the chat with a system message.
+chat_prompt = Chat([
+    {"role": "system", "content": "You are a helpful assistant."},
+])
+
+# Add a user message to the chat.
+chat_prompt.add_user_message("What's the capital of Latvia?")
+
+# Call the model with the chat input.
+response = model(chat_prompt)
+print(response.content) # 'The capital of Latvia is Riga.'
+
+# Add the output to the chat.
+chat_prompt.add_output(response)
+
+# Add another user message to the chat and call the model again.
+chat_prompt.add_user_message("How many inhabitants does it have?")
+response = model(chat_prompt)
+print(response.content) # '600,000'
+```
+
+## Tool Calls
+
+As described above, the output you receive from the model can contain a list of `ToolCallOutput` or `StreamingToolCallOutput` instances for the `tool_calls` field if the model decided to first call tools.
+
+A `ToolCallOutput` or `StreamingToolCallOutput` contains three fields:
+- `name`: the name of the tool to call
+- `id`: the id of the tool call to make. If provided, it should typically be included in the `ToolMessage` containing the tool response you would add to the `Chat`
+- `args`: the arguments to provide to the tool to call. This is a dictionnary for regular call and a string for streaming calls (as it could contain only a chunk of the whole args)
+
+See the section on [tools](./tools.md) for an explanation on how to use the `ToolCallOutput` to make a tool call.
@@ -0,0 +1,164 @@
+---
+title: Tools
+---
+
+# Tools
+
+## Overview
+
+Some models support tool calling, meaning that instead of directly providing its final response, the model can require to call tools you have defined and would later use the tool response in its final response. Tool calling typically goes along providing a `Chat` input as it implies a multiturn conversation with the model.
+
+For the moment, tool calling is supported by three Outlines models:
+
+- `Anthropic`
+- `Gemini`
+- `OpenAI`
+
+## Tool Definition
+
+Using tool calling starts with defining the tools that the model can call. There are three formats currently supported as described below.
+
+Once defined, the tools must be provided in a list to the `tools` keyword argument to the `Generator` constructor or to the text generation methods of a model. As such, the interface for `tools` is very similar to that of the `output_type`.
+
+#### ToolDef
+
+A tool can first by defined as a dictionnary. A `ToolDef` dict must contain the following keys:
+
+- `name`: The name of the tool
+- `description`: A description of the tool to help the LLM understand its use
+- `parameters`: A dictionnary containing the paramters of the tool, using the JSON properties format. If the LLM decides to call the tool, it will provide values for the parameters
+- `required`: A list of parameters that are mandatory. All those parameters must be included in the `parameters` key described above
+
+For instance:
+
+```python
+import openai
+from outlines import from_openai
+from outlines.inputs import Chat
+from outlines.tools import ToolDef
+
+client = openai.OpenAI()
+model = from_openai(client, "gpt-4o")
+
+chat = Chat([
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What's the weather in Tokyo?"},
+])
+
+weather_tool = ToolDef(
+    name="get_weather",
+    description="Give the weather for a given city, and optionally for a specific hour of the day",
+    parameters={"city": {"type": "string"}, "hour": {"type": "integer"}},
+    required=["city"],
+)
+
+response = model(chat, tools=[weather_tool])
+print(response.tool_calls) # [ToolCallOutput(name='get_weather', id='call_p7ToNwgrgoEk9poN7PXTELT5', args={'city': 'Tokyo'})]
+```
+
+#### Function
+
+A python function can be used as a tool definition. The `description` would then correspond to the docstring while the `parameters` and `required` would be deduced from the signature.
+
+```python
+import openai
+from outlines import from_openai
+from outlines.inputs import Chat
+from typing import Optional
+
+client = openai.OpenAI()
+model = from_openai(client, "gpt-4o")
+
+chat = Chat([
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What's the weather in Tokyo?"},
+])
+
+def get_weather(city: str, hour: Optional[int] = None):
+    """Give the weather for a given city, and optionally for a specific hour of the day"""
+    pass
+
+response = model(chat, tools=[get_weather])
+print(response.tool_calls) # [ToolCallOutput(name='get_weather', id='call_IdsfmBss6XhiBDbchTqp3HHz', args={'city': 'Tokyo'})]
+```
+
+#### Pydantic model
+
+Lastly, you can use a Pydantic model to define the interface of your tool.
+
+```python
+import openai
+from outlines import from_openai
+from outlines.inputs import Chat
+from pydantic import BaseModel
+from typing import Optional
+
+client = openai.OpenAI()
+model = from_openai(client, "gpt-4o")
+
+chat = Chat([
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What's the weather in Tokyo?"},
+])
+
+class GetWeather(BaseModel):
+    """Give the weather for a given city, and optionally for a specific hour of the day"""
+    city: str
+    hour: Optional[int] = None
+
+response = model(chat, tools=[GetWeather])
+print(response.tool_calls) # [ToolCallOutput(name='GetWeather', id='call_KWfADMEr6dnDDcw1m2dllRvq', args={'city': 'Tokyo'})]
+```
+
+## Tool Calls and Responses
+
+If the model decides to call a tool, you'll get a value for the `tool_calls` attribute of the `Output` received. This value is a `OutputToolCall` instance containing three attributes:
+
+- `name`: The name of the tool to call
+- `id`: The id of the tool call to be able to easily link the tool call and the tool response
+- `args`: A dictionnary containing for each parameter required by the tool the value provided by the LLM
+
+You should use the `name` and the `args` to call your tool yourself and get its reponse. Afterward, you can add to your chat the `Output` you first receive and a `ToolMessage` before being able to call the model again to continue the conversation.
+
+For instance:
+
+```python
+import openai
+from outlines import Generator, from_openai
+from outlines.inputs import Chat
+from typing import Optional
+
+# Our tool
+def get_weather(city: str, hour: Optional[int] = None):
+    """Give the weather for a given city, and optionally for a specific hour of the day"""
+    return "20 degrees"
+
+client = openai.OpenAI()
+model = from_openai(client, "gpt-4o")
+generator = Generator(model, tools=[get_weather])
+
+chat = Chat([
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What's the weather in Tokyo?"},
+])
+
+response = generator(chat)
+print(response.tool_calls) # [ToolCallOutput(name='get_weather', id='call_NlIGHr8HoiVgSZfOJ7Y5xz35', args={'city': 'Tokyo'})]
+
+# Add the model response to the chat
+chat.add_output(response)
+
+# Call the tool with the parameters given by the model and add a tool message to the chat
+tool_call = response.tool_calls[0]
+tool_response = get_weather(**tool_call.args)
+chat.add_tool_message(
+    content=tool_response,
+    tool_name=tool_call.name,
+    tool_call_id=tool_call.id
+)
+
+response = generator(chat)
+print(response.content) # The weather in Tokyo is currently 20 degrees.
+```
+
+When using streaming, the response would be a `StreamingOutput` and the `tool_calls` value a list of `StreamingOutputToolCall`. The only difference compared to what's the describe above is that the `args` field would be a string as the value is received by chunks. You need to concatenate the chunks together to get the full `args` to use to call the tool.