Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
d917ed7
merged main but still need to handle multipart connection for image e…
sbounmy Apr 27, 2025
f002e00
support multiple images
sbounmy Apr 27, 2025
4d113ae
added image attachments
sbounmy Apr 28, 2025
fd85df5
fixed specs
sbounmy Apr 28, 2025
f5c0c81
store tokens on image
sbounmy Apr 30, 2025
bdb16cb
pass model so we can compute image#cost
sbounmy Apr 30, 2025
e765cad
able to specify options in RubyLLM#edit
sbounmy May 5, 2025
67efbba
Merge branch 'main' of github.com:crmne/ruby_llm into paint-support-w…
sbounmy May 12, 2025
7a55a30
update capabilities
sbounmy May 12, 2025
5b93e9d
removed error
sbounmy May 12, 2025
a081b90
fix conneciton multipart
sbounmy May 12, 2025
087a149
fix duplicate models json
sbounmy May 12, 2025
1a33ce6
set headers content type
sbounmy May 12, 2025
d09a229
Merge branch 'main' into image-edit
tpaulshippy Aug 4, 2025
8fd7338
Merge branch 'main' into image-edit
tpaulshippy Aug 4, 2025
a68c038
Get specs passing after merge
tpaulshippy Aug 4, 2025
8af8551
Move paint/edit decision into OpenAI provider based on "with" parameter
tpaulshippy Aug 4, 2025
873688d
Get it working with one image
tpaulshippy Aug 4, 2025
bf4909f
Add spec to make sure multiple images works
tpaulshippy Aug 4, 2025
578e2d4
Refactor images.rb a bit and add usage to generation
tpaulshippy Aug 4, 2025
92c387d
Rubocop
tpaulshippy Aug 4, 2025
a0f001d
Add some docs
tpaulshippy Aug 4, 2025
8a1ae01
Merge branch 'main' into image-edit
tpaulshippy Aug 7, 2025
2425cde
Rubocop -A
tpaulshippy Aug 7, 2025
9fa1e38
No more passing connection and config around
tpaulshippy Aug 7, 2025
06b0978
Rubocop -A
tpaulshippy Aug 7, 2025
27c6c70
Merge branch 'main' into image-edit
tpaulshippy Aug 25, 2025
06b5319
Restore line that was inadvertently removed
tpaulshippy Aug 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 91 additions & 5 deletions docs/_core_features/image-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,13 @@ redirect_from:
After reading this guide, you will know:

* How to generate images from text prompts.
* How to edit existing images with AI (local files and remote URLs).
* How to select different image generation models.
* How to specify image sizes (for supported models).
* How to access and save generated image data (URL or Base64).
* How to integrate image generation with Rails Active Storage.
* Tips for writing effective image prompts.
* How to handle errors during image generation.
* How to handle errors during image generation and editing.

## Basic Image Generation

Expand Down Expand Up @@ -62,6 +63,86 @@ puts "Model Used: #{image.model_id}"

The `paint` method abstracts the differences between provider APIs.

## Image Editing

RubyLLM supports editing existing images using AI models like OpenAI's `gpt-image-1`. You can provide either local image files or remote URLs as the source image to edit.

### Editing Local Images

To edit a local image file, use the `with:` parameter to specify the path to your image:

```ruby
# Edit a local PNG file
image = RubyLLM.paint(
"turn the logo to green",
with: "path/to/your/image.png",
model: "gpt-image-1"
)

puts "Edited image generated!"
puts "MIME Type: #{image.mime_type}"
```

**Important Requirements for Local Images:**
- Only PNG, WEBP, or JPG files are supported for editing with gpt-image-1
- The file must exist and be readable

### Editing Remote Images

You can also edit images from remote URLs:

```ruby
# Edit an image from a remote URL
image = RubyLLM.paint(
"make the background more vibrant",
with: "https://example.com/image.png",
model: "gpt-image-1"
)
```

**Requirements for Remote URLs:**
- The URL must return a PNG, WEBP, or JPG image
- The server must respond with a valid content type
- 404 errors will raise a `Faraday::ResourceNotFound` error
- Invalid content types will raise a `RubyLLM::BadRequestError`

### Editing Multiple Images

You can edit multiple images at once by providing an array of file paths or URLs:

```ruby
# Edit multiple images simultaneously
image = RubyLLM.paint(
"apply a vintage filter to all images",
with: [
"path/to/first_image.png",
"path/to/second_image.png"
],
model: "gpt-image-1"
)
```

### Customizing Edit Parameters

The `params:` option allows you to customize the editing process:

```ruby
# Customize the editing output
image = RubyLLM.paint(
"enhance the colors and add dramatic lighting",
with: "path/to/image.png",
model: "gpt-image-1",
params: {
size: "1024x1024", # Output image size
quality: "low" # Quality setting (low, standard, hd)
}
)

# Check usage information
puts "Output tokens: #{image.usage['output_tokens']}"
puts "Total cost: $#{image.total_cost}"
```

## Choosing Models

By default, RubyLLM uses the model specified in `config.default_image_model`, but you can specify a different one.
Expand Down Expand Up @@ -258,20 +339,25 @@ See the [Error Handling Guide]({% link _advanced/error-handling.md %}) for compr

## Content Safety

AI image generation services have content safety filters. Prompts requesting harmful, explicit, or otherwise prohibited content will usually result in a `BadRequestError`. Avoid generating:
AI image generation and editing services have content safety filters. Prompts requesting harmful, explicit, or otherwise prohibited content will usually result in a `BadRequestError`. Avoid generating or editing:

* Violent or hateful imagery.
* Sexually explicit content.
* Images of real people (especially public figures without consent, though policies vary).
* Direct copies of copyrighted characters or artwork.

**Additional considerations for image editing:**
* Be mindful of editing copyrighted images without permission.
* Some providers may have stricter policies for editing existing images versus generating new ones.

## Performance Considerations

Image generation can take several seconds (typically 5-20 seconds depending on the model and load).
Image generation and editing can take several seconds (typically 5-20 seconds depending on the model and load).

* **Use Background Jobs:** In web applications, always perform image generation in a background job (like Sidekiq or GoodJob) to avoid blocking web requests.
* **Use Background Jobs:** In web applications, always perform image generation and editing in a background job (like Sidekiq or GoodJob) to avoid blocking web requests.
* **Timeouts:** Configure appropriate network timeouts in RubyLLM (see [Configuration Guide]({% link _getting_started/configuration.md %})).
* **Caching:** Store generated images (e.g., using Active Storage, cloud storage) rather than regenerating them frequently if the prompt is the same.
* **Caching:** Store generated and edited images (e.g., using Active Storage, cloud storage) rather than regenerating them frequently if the prompt is the same.
* **Image Editing Considerations:** When editing remote images, factor in additional time for downloading the source image before processing begins.

## Next Steps

Expand Down
2 changes: 2 additions & 0 deletions lib/ruby_llm.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
require 'event_stream_parser'
require 'faraday'
require 'faraday/retry'
require 'faraday/multipart'

require 'json'
require 'logger'
require 'securerandom'
Expand Down
19 changes: 19 additions & 0 deletions lib/ruby_llm/connection_multipart.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# frozen_string_literal: true

module RubyLLM
# A connection that uses multipart/form-data for file uploads
class ConnectionMultipart < Connection
def post(url, payload, &)
@connection.post url, payload do |req|
req.headers.merge! @provider.headers if @provider.respond_to?(:headers)
req.headers['Content-Type'] = 'multipart/form-data'
yield req if block_given?
end
end

def setup_middleware(faraday)
super
faraday.request :multipart, content_type: 'multipart/form-data'
end
end
end
27 changes: 23 additions & 4 deletions lib/ruby_llm/image.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,15 @@
module RubyLLM
# Represents a generated image from an AI model.
class Image
attr_reader :url, :data, :mime_type, :revised_prompt, :model_id
attr_reader :url, :data, :mime_type, :revised_prompt, :model_id, :usage

def initialize(url: nil, data: nil, mime_type: nil, revised_prompt: nil, model_id: nil)
def initialize(url: nil, data: nil, mime_type: nil, revised_prompt: nil, model_id: nil, usage: {}) # rubocop:disable Metrics/ParameterLists
@url = url
@data = data
@mime_type = mime_type
@revised_prompt = revised_prompt
@model_id = model_id
@usage = usage
end

def base64?
Expand All @@ -36,14 +37,32 @@ def self.paint(prompt, # rubocop:disable Metrics/ParameterLists
provider: nil,
assume_model_exists: false,
size: '1024x1024',
context: nil)
context: nil,
with: nil,
params: {})
config = context&.config || RubyLLM.config
model ||= config.default_image_model
model, provider_instance = Models.resolve(model, provider: provider, assume_exists: assume_model_exists,
config: config)
model_id = model.id

provider_instance.paint(prompt, model: model_id, size:)
provider_instance.paint(prompt, model: model_id, size:, with:, params:)
end

def total_cost
input_cost + output_cost
end

def model_info
@model_info ||= RubyLLM.models.find(model_id)
end

def input_cost
usage['input_tokens'] * model_info.input_price_per_million / 1_000_000
end

def output_cost
usage['output_tokens'] * model_info.output_price_per_million / 1_000_000
end
end
end
8 changes: 6 additions & 2 deletions lib/ruby_llm/provider.rb
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,8 @@ def embed(text, model:, dimensions:)
parse_embedding_response(response, model:, text:)
end

def paint(prompt, model:, size:)
payload = render_image_payload(prompt, model:, size:)
def paint(prompt, model:, size:, with:, params:)
payload = render_image_payload(prompt, model:, size:, with:, params:)
response = @connection.post images_url, payload
parse_image_response(response, model:)
end
Expand Down Expand Up @@ -121,6 +121,10 @@ def parse_tool_calls(_tool_calls)
nil
end

def connection_multipart(config)
@connection_multipart ||= ConnectionMultipart.new(self, config)
end

class << self
def name
to_s.split('::').last
Expand Down
2 changes: 1 addition & 1 deletion lib/ruby_llm/providers/gemini/images.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ def images_url
"models/#{@model}:predict"
end

def render_image_payload(prompt, model:, size:)
def render_image_payload(prompt, model:, size:, with:, params:) # rubocop:disable Lint/UnusedMethodArgument
RubyLLM.logger.debug "Ignoring size #{size}. Gemini does not support image size customization."
@model = model
{
Expand Down
4 changes: 3 additions & 1 deletion lib/ruby_llm/providers/openai/capabilities.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ module Capabilities
MODEL_PATTERNS = {
dall_e: /^dall-e/,
chatgpt4o: /^chatgpt-4o/,
gpt_image: /^gpt-image/,
gpt41: /^gpt-4\.1(?!-(?:mini|nano))/,
gpt41_mini: /^gpt-4\.1-mini/,
gpt41_nano: /^gpt-4\.1-nano/,
Expand Down Expand Up @@ -105,6 +106,7 @@ def supports_json_mode?(model_id)
end

PRICES = {
gpt_image_1: { input_text: 5.0, input_image: 10.0, output: 8.0, cached_input: 0.5 }, # rubocop:disable Naming/VariableNumber
gpt41: { input: 2.0, output: 8.0, cached_input: 0.5 },
gpt41_mini: { input: 0.4, output: 1.6, cached_input: 0.1 },
gpt41_nano: { input: 0.1, output: 0.4 },
Expand Down Expand Up @@ -168,7 +170,7 @@ def model_type(model_id)
when /embedding/ then 'embedding'
when /^tts|whisper|gpt4o_(?:mini_)?(?:transcribe|tts)$/ then 'audio'
when 'moderation' then 'moderation'
when /dall/ then 'image'
when /dall-e|gpt-image/ then 'image'
else 'chat'
end
end
Expand Down
76 changes: 73 additions & 3 deletions lib/ruby_llm/providers/openai/images.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,49 @@ module Providers
class OpenAI
# Image generation methods for the OpenAI API integration
module Images
def paint(prompt, model:, size:, with:, params:)
@operation = with.nil? ? :generation : :editing
@connection = connection_multipart(@connection.config) if editing? && !multipart_middleware?(@connection)
super
end

private

def editing?
@operation == :editing
end

def generating?
@operation == :generation
end

def multipart_middleware?(connection)
connection.connection.builder.handlers.include?(Faraday::Multipart::Middleware)
end

module_function

def images_url
generating? ? generation_url : edits_url
end

def generation_url
'images/generations'
end

def render_image_payload(prompt, model:, size:)
def edits_url
'images/edits'
end

def render_image_payload(prompt, model:, size:, with:, params:)
if generating?
render_generation_payload(prompt, model:, size:)
else
render_edit_payload(prompt, model:, with:, params:)
end
end

def render_generation_payload(prompt, model:, size:)
{
model: model,
prompt: prompt,
Expand All @@ -20,16 +56,50 @@ def render_image_payload(prompt, model:, size:)
}
end

def render_edit_payload(prompt, model:, with:, params:)
content = Content.new(prompt, with)
params[:image] = []
content.attachments.each do |attachment|
params[:image] << Faraday::UploadIO.new(StringIO.new(attachment.content), attachment.mime_type,
attachment.filename)
end
params.merge({
model:,
prompt: content.text,
n: 1
})
end

def parse_image_response(response, model:)
if generating?
parse_generation_response(response, model:)
else
parse_edit_response(response, model:)
end
end

def parse_generation_response(response, model:)
data = response.body
image_data = data['data'].first

Image.new(
url: image_data['url'],
mime_type: 'image/png', # DALL-E typically returns PNGs
mime_type: 'image/png',
revised_prompt: image_data['revised_prompt'],
model_id: model,
data: image_data['b64_json']
data: image_data['b64_json'],
usage: data['usage']
)
end

def parse_edit_response(response, model:)
data = response.body
image_data = data['data'].first
Image.new(
mime_type: 'image/png',
model_id: model,
data: image_data['b64_json'],
usage: data['usage']
)
end
end
Expand Down
Binary file added spec/fixtures/ruby_with_blue.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading