Updated documentation for 1.3.0

crmne · crmne · commit 555cf0a5d4ae · 2025-06-03T17:04:11.000+02:00
diff --git a/README.md b/README.md
@@ -46,14 +46,14 @@ RubyLLM fixes all that. One beautiful API for everything. One consistent format.
 chat = RubyLLM.chat
 chat.ask "What's the best way to learn Ruby?"
 
-# Analyze images
-chat.ask "What's in this image?", with: { image: "ruby_conf.jpg" }
+# Analyze images, audio, documents, and text files
+chat.ask "What's in this image?", with: "ruby_conf.jpg"
+chat.ask "Describe this meeting", with: "meeting.wav"
+chat.ask "Summarize this document", with: "contract.pdf"
+chat.ask "Explain this code", with: "app.rb"
 
-# Analyze audio recordings
-chat.ask "Describe this meeting", with: { audio: "meeting.wav" }
-
-# Analyze documents
-chat.ask "Summarize this document", with: { pdf: "contract.pdf" }
+# Multiple files at once - types automatically detected
+chat.ask "Analyze these files", with: ["diagram.png", "report.pdf", "notes.txt"]
 
 # Stream responses in real-time
 chat.ask "Tell me a story about a Ruby programmer" do |chunk|
@@ -90,7 +90,7 @@ chat.with_tool(Weather).ask "What's the weather in Berlin? (52.5200, 13.4050)"
 *   💬 **Unified Chat:** Converse with models from OpenAI, Anthropic, Gemini, Bedrock, OpenRouter, DeepSeek, Ollama, or any OpenAI-compatible API using `RubyLLM.chat`.
 *   👁️ **Vision:** Analyze images within chats.
 *   🔊 **Audio:** Transcribe and understand audio content.
-*   📄 **PDF Analysis:** Extract information and summarize PDF documents.
+*   📄 **Document Analysis:** Extract information from PDFs, text files, and other documents.
 *   🖼️ **Image Generation:** Create images with `RubyLLM.paint`.
 *   📊 **Embeddings:** Generate text embeddings for vector search with `RubyLLM.embed`.
 *   🔧 **Tools (Function Calling):** Let AI models call your Ruby code using `RubyLLM::Tool`.
@@ -143,6 +143,10 @@ end
 # Now interacting with a Chat record persists the conversation:
 chat_record = Chat.create!(model_id: "gpt-4.1-nano")
 chat_record.ask("Explain Active Record callbacks.") # User & Assistant messages saved
+
+# Works seamlessly with file attachments - types automatically detected
+chat_record.ask("What's in this file?", with: "report.pdf")
+chat_record.ask("Analyze these", with: ["image.jpg", "data.csv", "notes.txt"])
 ```
 Check the [Rails Integration Guide](https://rubyllm.com/guides/rails) for more.
 
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -31,13 +31,12 @@ After reading this guide, you will know:
 
 ## Global Configuration (`RubyLLM.configure`)
 
-{: .warning }
-> Native OpenRouter and Ollama support is coming in v1.3.0
->
-> Consider using `openai_api_base` in the meantime.
-
 The primary way to configure RubyLLM is using the `RubyLLM.configure` block. This typically runs once when your application starts (e.g., in `config/initializers/ruby_llm.rb` for Rails apps, or at the top of a script).
 
+RubyLLM provides sensible defaults, so you only need to configure what you really need.
+
+Here's a reference of all the configuration options RubyLLM provides:
+
 ```ruby
 require 'ruby_llm'
 
@@ -78,7 +77,13 @@ RubyLLM.configure do |config|
   config.retry_interval = 0.1 # Initial delay in seconds (default: 0.1)
   config.retry_backoff_factor = 2 # Multiplier for subsequent retries (default: 2)
   config.retry_interval_randomness = 0.5 # Jitter factor (default: 0.5)
-  config.http_proxy = ENV.fetch('HTTP_PROXY', 'http://proxy.example.com:3128') # Optional HTTP proxy
+
+  # --- HTTP Proxy Support ---
+  config.http_proxy = ENV.fetch('HTTP_PROXY', nil) # Optional HTTP proxy
+  # Examples:
+  # config.http_proxy = "http://proxy.company.com:8080"           # Basic proxy
+  # config.http_proxy = "http://user:pass@proxy.company.com:8080" # Authenticated proxy
+  # config.http_proxy = "socks5://proxy.company.com:1080"        # SOCKS5 proxy
 
   # --- Logging Settings ---
   config.log_file = '/logs/ruby_llm.log'
@@ -88,7 +93,7 @@ end
 ```
 
 {: .note }
-You only need to set the API keys for the providers you actually plan to use. Attempting to use an unconfigured provider will result in a `RubyLLM::ConfigurationError`.
+You only need to set configuration options you need and the API keys for the providers you actually plan to use. Attempting to use an unconfigured provider will result in a `RubyLLM::ConfigurationError`.
 
 ## Provider API Keys
 
@@ -122,10 +127,6 @@ end
 This setting redirects requests made with `provider: :openai` to your specified base URL. See the [Working with Models Guide]({% link guides/models.md %}#connecting-to-custom-endpoints--using-unlisted-models) for more details on using custom models with this setting.
 
 ## Optional OpenAI Headers
-{: .d-inline-block }
-
-Coming in v1.3.0
-{: .label .label-yellow }
 
 OpenAI supports additional headers for organization and project management:
 
@@ -157,11 +158,6 @@ Fine-tune how RubyLLM handles HTTP connections and retries.
 Adjust these based on network conditions and provider reliability.
 
 ## Logging Settings
-{: .d-inline-block }
-
-Coming in v1.3.0
-{: .label .label-yellow }
-
 RubyLLM provides flexible logging configuration to help you monitor and debug API interactions. You can configure both the log file location and the logging level.
 
 ```ruby
@@ -186,10 +182,6 @@ end
 You can also set the debug level by setting the `RUBYLLM_DEBUG` environment variable to `true`.
 
 ## Scoped Configuration with Contexts
-{: .d-inline-block }
-
-Coming in v1.3.0
-{: .label .label-yellow }
 
 While `RubyLLM.configure` sets global defaults, `RubyLLM.context` allows you to create temporary, isolated configuration scopes for specific API calls. This is ideal for situations requiring different keys, endpoints, or timeouts temporarily without affecting the rest of the application.
 
diff --git a/docs/guides/chat.md b/docs/guides/chat.md
@@ -119,7 +119,7 @@ RubyLLM manages a registry of known models and their capabilities. For detailed
 
 ## Multi-modal Conversations
 
-Modern AI models can often process more than just text. RubyLLM provides a unified way to include images, audio, and even PDFs in your chat messages using the `with:` option in the `ask` method.
+Modern AI models can often process more than just text. RubyLLM provides a unified way to include images, audio, text files, and PDFs in your chat messages using the `with:` option in the `ask` method.
 
 ### Working with Images
 
@@ -130,17 +130,15 @@ Provide image paths or URLs to vision-capable models (like `gpt-4o`, `claude-3-o
 chat = RubyLLM.chat(model: 'gpt-4o')
 
 # Ask about a local image file
-response = chat.ask "Describe this logo.", with: { image: "path/to/ruby_logo.png" }
+response = chat.ask "Describe this logo.", with: "path/to/ruby_logo.png"
 puts response.content
 
 # Ask about an image from a URL
-response = chat.ask "What kind of architecture is shown here?", with: { image: "https://example.com/eiffel_tower.jpg" }
+response = chat.ask "What kind of architecture is shown here?", with: "https://example.com/eiffel_tower.jpg"
 puts response.content
 
 # Send multiple images
-response = chat.ask "Compare the user interfaces in these two screenshots.", with: {
-  image: ["screenshot_v1.png", "screenshot_v2.png"]
-}
+response = chat.ask "Compare the user interfaces in these two screenshots.", with: ["screenshot_v1.png", "screenshot_v2.png"]
 puts response.content
 ```
 
@@ -154,14 +152,30 @@ Provide audio file paths to audio-capable models (like `gpt-4o-audio-preview`).
 chat = RubyLLM.chat(model: 'gpt-4o-audio-preview') # Use an audio-capable model
 
 # Transcribe or ask questions about audio content
-response = chat.ask "Please transcribe this meeting recording.", with: { audio: "path/to/meeting.mp3" }
+response = chat.ask "Please transcribe this meeting recording.", with: "path/to/meeting.mp3"
 puts response.content
 
 # Ask follow-up questions based on the audio context
 response = chat.ask "What were the main action items discussed?"
 puts response.content
 ```
 
+### Working with Text Files
+
+Provide text file paths to models that support document analysis.
+
+```ruby
+chat = RubyLLM.chat(model: 'claude-3-5-sonnet')
+
+# Analyze a text file
+response = chat.ask "Summarize the key points in this document.", with: "path/to/document.txt"
+puts response.content
+
+# Ask questions about code files
+response = chat.ask "Explain what this Ruby file does.", with: "app/models/user.rb"
+puts response.content
+```
+
 ### Working with PDFs
 
 Provide PDF paths or URLs to models that support document analysis (currently Claude 3+ and Gemini models).
@@ -171,21 +185,49 @@ Provide PDF paths or URLs to models that support document analysis (currently Cl
 chat = RubyLLM.chat(model: 'claude-3-7-sonnet')
 
 # Ask about a local PDF
-response = chat.ask "Summarize the key findings in this research paper.", with: { pdf: "path/to/paper.pdf" }
+response = chat.ask "Summarize the key findings in this research paper.", with: "path/to/paper.pdf"
 puts response.content
 
 # Ask about a PDF via URL
-response = chat.ask "What are the terms and conditions outlined here?", with: { pdf: "https://example.com/terms.pdf" }
+response = chat.ask "What are the terms and conditions outlined here?", with: "https://example.com/terms.pdf"
 puts response.content
 
 # Combine text and PDF context
-response = chat.ask "Based on section 3 of this document, what is the warranty period?", with: { pdf: "manual.pdf" }
+response = chat.ask "Based on section 3 of this document, what is the warranty period?", with: "manual.pdf"
 puts response.content
 ```
 
 {: .note }
 **PDF Limitations:** Be mindful of provider-specific limits. For example, Anthropic Claude models currently have a 10MB per-file size limit, and the total size/token count of all PDFs must fit within the model's context window (e.g., 200,000 tokens for Claude 3 models).
 
+### Simplified Attachment API
+
+RubyLLM automatically detects file types based on extensions and content, so you can pass files directly without specifying the type:
+
+```ruby
+chat = RubyLLM.chat(model: 'claude-3-5-sonnet')
+
+# Single file - type automatically detected
+response = chat.ask "What's in this file?", with: "path/to/document.pdf"
+
+# Multiple files of different types
+response = chat.ask "Analyze these files", with: [
+  "diagram.png",
+  "report.pdf",
+  "meeting_notes.txt",
+  "recording.mp3"
+]
+
+# Still works with the explicit hash format if needed
+response = chat.ask "What's in this image?", with: { image: "photo.jpg" }
+```
+
+**Supported file types:**
+- **Images:** .jpg, .jpeg, .png, .gif, .webp, .bmp
+- **Audio:** .mp3, .wav, .m4a, .ogg, .flac
+- **Documents:** .pdf, .txt, .md, .csv, .json, .xml
+- **Code:** .rb, .py, .js, .html, .css (and many others)
+
 ## Controlling Creativity: Temperature
 
 The `temperature` setting influences the randomness and creativity of the AI's responses. A higher value (e.g., 0.9) leads to more varied and potentially surprising outputs, while a lower value (e.g., 0.1) makes the responses more focused, deterministic, and predictable. The default is generally around 0.7.
diff --git a/docs/guides/embeddings.md b/docs/guides/embeddings.md
@@ -80,6 +80,14 @@ embedding_google = RubyLLM.embed(
   "This is another test sentence",
   model: "text-embedding-004" # Google's model
 )
+
+# Use a model not in the registry (useful for custom endpoints)
+embedding_custom = RubyLLM.embed(
+  "Custom model test",
+  model: "my-custom-embedding-model",
+  provider: :openai,
+  assume_model_exists: true
+)
 ```
 
 You can configure the default embedding model globally:
@@ -93,10 +101,6 @@ end
 Refer to the [Working with Models Guide]({% link guides/models.md %}) for details on finding available embedding models and their capabilities.
 
 ## Choosing Dimensions
-{: .d-inline-block }
-
-Coming in v1.3.0
-{: .label .label-yellow }
 
 Each embedding model has its own default output dimensions. For example, OpenAI's `text-embedding-3-small` outputs 1536 dimensions by default, while `text-embedding-3-large` outputs 3072 dimensions. RubyLLM allows you to specify these dimensions per request:
 
diff --git a/docs/guides/image-generation.md b/docs/guides/image-generation.md
@@ -77,6 +77,14 @@ image_imagen = RubyLLM.paint(
   "Cyberpunk city street at night, raining, neon signs",
   model: "imagen-3.0-generate-002"
 )
+
+# Use a model not in the registry (useful for custom endpoints)
+image_custom = RubyLLM.paint(
+  "A sunset over mountains",
+  model: "my-custom-image-model",
+  provider: :openai,
+  assume_model_exists: true
+)
 ```
 
 You can configure the default model globally:
diff --git a/docs/guides/models.md b/docs/guides/models.md
@@ -191,6 +191,26 @@ chat.with_model(
 )
 ```
 
+The `assume_model_exists` flag also works with `RubyLLM.embed` and `RubyLLM.paint` for embedding and image generation models:
+
+```ruby
+# Custom embedding model
+embedding = RubyLLM.embed(
+  "Test text",
+  model: 'my-custom-embedder',
+  provider: :openai,
+  assume_model_exists: true
+)
+
+# Custom image model
+image = RubyLLM.paint(
+  "A beautiful landscape",
+  model: 'my-custom-dalle',
+  provider: :openai,
+  assume_model_exists: true
+)
+```
+
 **Key Points when Assuming Existence:**
 
 *   **`provider:` is Mandatory:** You must tell RubyLLM which API format to use (`ArgumentError` otherwise).
diff --git a/docs/guides/rails.md b/docs/guides/rails.md
@@ -50,7 +50,7 @@ This two-phase approach (create empty → update with content) is intentional an
 
 1. **Streaming-first design**: By creating the message record before the API call, your UI can immediately show a "thinking" state and have a DOM target ready for incoming chunks.
 2. **Turbo Streams compatibility**: Works perfectly with `after_create_commit { broadcast_append_to... }` for real-time updates.
-3. **Clean rollback on failure**: If the API call fails, the empty message is automatically removed.
+3. **Clean rollback on failure**: If the API call fails, the empty assistant message is automatically removed, preventing orphaned records that could cause issues with providers like Gemini that reject empty messages.
 
 ### Content Validation Implications
 
@@ -118,10 +118,6 @@ end
 Run the migrations: `rails db:migrate`
 
 ### ActiveStorage Setup for Attachments (Optional)
-{: .d-inline-block }
-
-Coming in v1.3.0
-{: .label .label-yellow }
 
 If you want to use attachments (images, audio, PDFs) with your AI chats, you need to set up ActiveStorage:
 
@@ -272,10 +268,6 @@ puts chat_record.messages.count # => 3 (user, assistant's tool call, tool result
 ```
 
 ### Working with Attachments
-{: .d-inline-block }
-
-Coming in v1.3.0
-{: .label .label-yellow }
 
 If you've set up ActiveStorage as described above, you can easily send attachments to AI models with automatic type detection:
 
@@ -290,22 +282,18 @@ chat_record.ask("What's in this file?", with: "app/assets/images/diagram.png")
 chat_record.ask("What are in these files?", with: [
   "app/assets/documents/report.pdf",
   "app/assets/images/chart.jpg",
+  "app/assets/text/notes.txt",
   "app/assets/audio/recording.mp3"
 ])
 
-# Still works with manually categorized hash (backward compatible)
-chat_record.ask("What's in this image?", with: {
-  image: "app/assets/images/diagram.png"
-})
-
 # Works with file uploads from forms
 chat_record.ask("Analyze this file", with: params[:uploaded_file])
 
 # Works with existing ActiveStorage attachments
 chat_record.ask("What's in this document?", with: user.profile_document)
 ```
 
-The attachment API automatically detects file types based on file extension or content type, so you don't need to specify whether something is an image, audio file, or PDF - RubyLLM figures it out for you!
+The attachment API automatically detects file types based on file extension or content type, so you don't need to specify whether something is an image, audio file, PDF, or text document - RubyLLM figures it out for you!
 
 ## Handling Persistence Edge Cases
 
diff --git a/docs/index.md b/docs/index.md
@@ -72,14 +72,14 @@ RubyLLM fixes all that. One beautiful API for everything. One consistent format.
 chat = RubyLLM.chat
 chat.ask "What's the best way to learn Ruby?"
 
-# Analyze images
-chat.ask "What's in this image?", with: { image: "ruby_conf.jpg" }
+# Analyze images, audio, documents, and text files
+chat.ask "What's in this image?", with: "ruby_conf.jpg"
+chat.ask "Describe this meeting", with: "meeting.wav"
+chat.ask "Summarize this document", with: "contract.pdf"
+chat.ask "Explain this code", with: "app.rb"
 
-# Analyze audio recordings
-chat.ask "Describe this meeting", with: { audio: "meeting.wav" }
-
-# Analyze documents
-chat.ask "Summarize this document", with: { pdf: "contract.pdf" }
+# Multiple files at once - types automatically detected
+chat.ask "Analyze these files", with: ["diagram.png", "report.pdf", "notes.txt"]
 
 # Stream responses in real-time
 chat.ask "Tell me a story about a Ruby programmer" do |chunk|
@@ -116,7 +116,7 @@ chat.with_tool(Weather).ask "What's the weather in Berlin? (52.5200, 13.4050)"
 *   💬 **Unified Chat:** Converse with models from OpenAI, Anthropic, Gemini, Bedrock, OpenRouter, DeepSeek, Ollama, or any OpenAI-compatible API using `RubyLLM.chat`.
 *   👁️ **Vision:** Analyze images within chats.
 *   🔊 **Audio:** Transcribe and understand audio content.
-*   📄 **PDF Analysis:** Extract information and summarize PDF documents.
+*   📄 **Document Analysis:** Extract information from PDFs, text files, and other documents.
 *   🖼️ **Image Generation:** Create images with `RubyLLM.paint`.
 *   📊 **Embeddings:** Generate text embeddings for vector search with `RubyLLM.embed`.
 *   🔧 **Tools (Function Calling):** Let AI models call your Ruby code using `RubyLLM::Tool`.
@@ -169,5 +169,9 @@ end
 # Now interacting with a Chat record persists the conversation:
 chat_record = Chat.create!(model_id: "gpt-4.1-nano")
 chat_record.ask("Explain Active Record callbacks.") # User & Assistant messages saved
+
+# Works seamlessly with file attachments - types automatically detected
+chat_record.ask("What's in this file?", with: "report.pdf")
+chat_record.ask("Analyze these", with: ["image.jpg", "data.csv", "notes.txt"])
 ```
 Check the [Rails Integration Guide](https://rubyllm.com/guides/rails) for more.