crmne · tpaulshippy · Apr 27, 2025 · Apr 27, 2025 · Apr 28, 2025 · Apr 28, 2025
diff --git a/docs/_core_features/image-generation.md b/docs/_core_features/image-generation.md
@@ -24,12 +24,13 @@ redirect_from:
 After reading this guide, you will know:
 
 *   How to generate images from text prompts.
+*   How to edit existing images with AI (local files and remote URLs).
 *   How to select different image generation models.
 *   How to specify image sizes (for supported models).
 *   How to access and save generated image data (URL or Base64).
 *   How to integrate image generation with Rails Active Storage.
 *   Tips for writing effective image prompts.
-*   How to handle errors during image generation.
+*   How to handle errors during image generation and editing.
 
 ## Basic Image Generation
 
@@ -62,6 +63,86 @@ puts "Model Used: #{image.model_id}"
 
 The `paint` method abstracts the differences between provider APIs.
 
+## Image Editing
+
+RubyLLM supports editing existing images using AI models like OpenAI's `gpt-image-1`. You can provide either local image files or remote URLs as the source image to edit.
+
+### Editing Local Images
+
+To edit a local image file, use the `with:` parameter to specify the path to your image:
+
+```ruby
+# Edit a local PNG file
+image = RubyLLM.paint(
+  "turn the logo to green", 
+  with: "path/to/your/image.png",
+  model: "gpt-image-1"
+)
+
+puts "Edited image generated!"
+puts "MIME Type: #{image.mime_type}"
+```
+
+**Important Requirements for Local Images:**
+- Only PNG, WEBP, or JPG files are supported for editing with gpt-image-1
+- The file must exist and be readable
+
+### Editing Remote Images
+
+You can also edit images from remote URLs:
+
+```ruby
+# Edit an image from a remote URL
+image = RubyLLM.paint(
+  "make the background more vibrant",
+  with: "https://example.com/image.png",
+  model: "gpt-image-1"
+)
+```
+
+**Requirements for Remote URLs:**
+- The URL must return a PNG, WEBP, or JPG image
+- The server must respond with a valid content type
+- 404 errors will raise a `Faraday::ResourceNotFound` error
+- Invalid content types will raise a `RubyLLM::BadRequestError`
+
+### Editing Multiple Images
+
+You can edit multiple images at once by providing an array of file paths or URLs:
+
+```ruby
+# Edit multiple images simultaneously
+image = RubyLLM.paint(
+  "apply a vintage filter to all images",
+  with: [
+    "path/to/first_image.png",
+    "path/to/second_image.png"
+  ],
+  model: "gpt-image-1"
+)
+```
+
+### Customizing Edit Parameters
+
+The `params:` option allows you to customize the editing process:
+
+```ruby
+# Customize the editing output
+image = RubyLLM.paint(
+  "enhance the colors and add dramatic lighting",
+  with: "path/to/image.png",
+  model: "gpt-image-1",
+  params: {
+    size: "1024x1024",    # Output image size
+    quality: "low"        # Quality setting (low, standard, hd)
+  }
+)
+
+# Check usage information
+puts "Output tokens: #{image.usage['output_tokens']}"
+puts "Total cost: $#{image.total_cost}"
+```
+
 ## Choosing Models
 
 By default, RubyLLM uses the model specified in `config.default_image_model`, but you can specify a different one.
@@ -258,20 +339,25 @@ See the [Error Handling Guide]({% link _advanced/error-handling.md %}) for compr
 
 ## Content Safety
 
-AI image generation services have content safety filters. Prompts requesting harmful, explicit, or otherwise prohibited content will usually result in a `BadRequestError`. Avoid generating:
+AI image generation and editing services have content safety filters. Prompts requesting harmful, explicit, or otherwise prohibited content will usually result in a `BadRequestError`. Avoid generating or editing:
 
 *   Violent or hateful imagery.
 *   Sexually explicit content.
 *   Images of real people (especially public figures without consent, though policies vary).
 *   Direct copies of copyrighted characters or artwork.
 
+**Additional considerations for image editing:**
+*   Be mindful of editing copyrighted images without permission.
+*   Some providers may have stricter policies for editing existing images versus generating new ones.
+
 ## Performance Considerations
 
-Image generation can take several seconds (typically 5-20 seconds depending on the model and load).
+Image generation and editing can take several seconds (typically 5-20 seconds depending on the model and load).
 
-*   **Use Background Jobs:** In web applications, always perform image generation in a background job (like Sidekiq or GoodJob) to avoid blocking web requests.
+*   **Use Background Jobs:** In web applications, always perform image generation and editing in a background job (like Sidekiq or GoodJob) to avoid blocking web requests.
 *   **Timeouts:** Configure appropriate network timeouts in RubyLLM (see [Configuration Guide]({% link _getting_started/configuration.md %})).
-*   **Caching:** Store generated images (e.g., using Active Storage, cloud storage) rather than regenerating them frequently if the prompt is the same.
+*   **Caching:** Store generated and edited images (e.g., using Active Storage, cloud storage) rather than regenerating them frequently if the prompt is the same.
+*   **Image Editing Considerations:** When editing remote images, factor in additional time for downloading the source image before processing begins.
 
 ## Next Steps
 

diff --git a/lib/ruby_llm.rb b/lib/ruby_llm.rb
@@ -4,6 +4,8 @@
 require 'event_stream_parser'
 require 'faraday'
 require 'faraday/retry'
+require 'faraday/multipart'
+
 require 'json'
 require 'logger'
 require 'securerandom'

diff --git a/lib/ruby_llm/connection_multipart.rb b/lib/ruby_llm/connection_multipart.rb
@@ -0,0 +1,19 @@
+# frozen_string_literal: true
+
+module RubyLLM
+  # A connection that uses multipart/form-data for file uploads
+  class ConnectionMultipart < Connection
+    def post(url, payload, &)
+      @connection.post url, payload do |req|
+        req.headers.merge! @provider.headers if @provider.respond_to?(:headers)
+        req.headers['Content-Type'] = 'multipart/form-data'
+        yield req if block_given?
+      end
+    end
+
+    def setup_middleware(faraday)
+      super
+      faraday.request :multipart, content_type: 'multipart/form-data'
+    end
+  end
+end
diff --git a/lib/ruby_llm/image.rb b/lib/ruby_llm/image.rb
@@ -3,14 +3,15 @@
 module RubyLLM
   # Represents a generated image from an AI model.
   class Image
-    attr_reader :url, :data, :mime_type, :revised_prompt, :model_id
+    attr_reader :url, :data, :mime_type, :revised_prompt, :model_id, :usage
 
-    def initialize(url: nil, data: nil, mime_type: nil, revised_prompt: nil, model_id: nil)
+    def initialize(url: nil, data: nil, mime_type: nil, revised_prompt: nil, model_id: nil, usage: {}) # rubocop:disable Metrics/ParameterLists
       @url = url
       @data = data
       @mime_type = mime_type
       @revised_prompt = revised_prompt
       @model_id = model_id
+      @usage = usage
     end
 
     def base64?
@@ -36,14 +37,32 @@ def self.paint(prompt, # rubocop:disable Metrics/ParameterLists
                    provider: nil,
                    assume_model_exists: false,
                    size: '1024x1024',
-                   context: nil)
+                   context: nil,
+                   with: nil,
+                   params: {})
       config = context&.config || RubyLLM.config
       model ||= config.default_image_model
       model, provider_instance = Models.resolve(model, provider: provider, assume_exists: assume_model_exists,
                                                        config: config)
       model_id = model.id
 
-      provider_instance.paint(prompt, model: model_id, size:)
+      provider_instance.paint(prompt, model: model_id, size:, with:, params:)
+    end
+
+    def total_cost
+      input_cost + output_cost
+    end
+
+    def model_info
+      @model_info ||= RubyLLM.models.find(model_id)
+    end
+
+    def input_cost
+      usage['input_tokens'] * model_info.input_price_per_million / 1_000_000
+    end
+
+    def output_cost
+      usage['output_tokens'] * model_info.output_price_per_million / 1_000_000
     end
   end
 end
diff --git a/lib/ruby_llm/provider.rb b/lib/ruby_llm/provider.rb
@@ -70,8 +70,8 @@ def embed(text, model:, dimensions:)
       parse_embedding_response(response, model:, text:)
     end
 
-    def paint(prompt, model:, size:)
-      payload = render_image_payload(prompt, model:, size:)
+    def paint(prompt, model:, size:, with:, params:)
+      payload = render_image_payload(prompt, model:, size:, with:, params:)
       response = @connection.post images_url, payload
       parse_image_response(response, model:)
     end
@@ -121,6 +121,10 @@ def parse_tool_calls(_tool_calls)
       nil
     end
 
+    def connection_multipart(config)
+      @connection_multipart ||= ConnectionMultipart.new(self, config)
+    end
+
     class << self
       def name
         to_s.split('::').last

diff --git a/lib/ruby_llm/providers/gemini/images.rb b/lib/ruby_llm/providers/gemini/images.rb
@@ -9,7 +9,7 @@ def images_url
           "models/#{@model}:predict"
         end
 
-        def render_image_payload(prompt, model:, size:)
+        def render_image_payload(prompt, model:, size:, with:, params:) # rubocop:disable Lint/UnusedMethodArgument
           RubyLLM.logger.debug "Ignoring size #{size}. Gemini does not support image size customization."
           @model = model
           {

diff --git a/lib/ruby_llm/providers/openai/capabilities.rb b/lib/ruby_llm/providers/openai/capabilities.rb
@@ -10,6 +10,7 @@ module Capabilities
         MODEL_PATTERNS = {
           dall_e: /^dall-e/,
           chatgpt4o: /^chatgpt-4o/,
+          gpt_image: /^gpt-image/,
           gpt41: /^gpt-4\.1(?!-(?:mini|nano))/,
           gpt41_mini: /^gpt-4\.1-mini/,
           gpt41_nano: /^gpt-4\.1-nano/,
@@ -105,6 +106,7 @@ def supports_json_mode?(model_id)
         end
 
         PRICES = {
+          gpt_image_1: { input_text: 5.0, input_image: 10.0, output: 8.0, cached_input: 0.5 }, # rubocop:disable Naming/VariableNumber
           gpt41: { input: 2.0, output: 8.0, cached_input: 0.5 },
           gpt41_mini: { input: 0.4, output: 1.6, cached_input: 0.1 },
           gpt41_nano: { input: 0.1, output: 0.4 },
@@ -168,7 +170,7 @@ def model_type(model_id)
           when /embedding/ then 'embedding'
           when /^tts|whisper|gpt4o_(?:mini_)?(?:transcribe|tts)$/ then 'audio'
           when 'moderation' then 'moderation'
-          when /dall/ then 'image'
+          when /dall-e|gpt-image/ then 'image'
           else 'chat'
           end
         end

diff --git a/lib/ruby_llm/providers/openai/images.rb b/lib/ruby_llm/providers/openai/images.rb
@@ -5,13 +5,49 @@ module Providers
     class OpenAI
       # Image generation methods for the OpenAI API integration
       module Images
+        def paint(prompt, model:, size:, with:, params:)
+          @operation = with.nil? ? :generation : :editing
+          @connection = connection_multipart(@connection.config) if editing? && !multipart_middleware?(@connection)
+          super
+        end
+
+        private
+
+        def editing?
+          @operation == :editing
+        end
+
+        def generating?
+          @operation == :generation
+        end
+
+        def multipart_middleware?(connection)
+          connection.connection.builder.handlers.include?(Faraday::Multipart::Middleware)
+        end
+
         module_function
 
         def images_url
+          generating? ? generation_url : edits_url
+        end
+
+        def generation_url
           'images/generations'
         end
 
-        def render_image_payload(prompt, model:, size:)
+        def edits_url
+          'images/edits'
+        end
+
+        def render_image_payload(prompt, model:, size:, with:, params:)
+          if generating?
+            render_generation_payload(prompt, model:, size:)
+          else
+            render_edit_payload(prompt, model:, with:, params:)
+          end
+        end
+
+        def render_generation_payload(prompt, model:, size:)
           {
             model: model,
             prompt: prompt,
@@ -20,16 +56,50 @@ def render_image_payload(prompt, model:, size:)
           }
         end
 
+        def render_edit_payload(prompt, model:, with:, params:)
+          content = Content.new(prompt, with)
+          params[:image] = []
+          content.attachments.each do |attachment|
+            params[:image] << Faraday::UploadIO.new(StringIO.new(attachment.content), attachment.mime_type,
+                                                    attachment.filename)
+          end
+          params.merge({
+                         model:,
+                         prompt: content.text,
+                         n: 1
+                       })
+        end
+
         def parse_image_response(response, model:)
+          if generating?
+            parse_generation_response(response, model:)
+          else
+            parse_edit_response(response, model:)
+          end
+        end
+
+        def parse_generation_response(response, model:)
           data = response.body
           image_data = data['data'].first
 
           Image.new(
             url: image_data['url'],
-            mime_type: 'image/png', # DALL-E typically returns PNGs
+            mime_type: 'image/png',
             revised_prompt: image_data['revised_prompt'],
             model_id: model,
-            data: image_data['b64_json']
+            data: image_data['b64_json'],
+            usage: data['usage']
+          )
+        end
+
+        def parse_edit_response(response, model:)
+          data = response.body
+          image_data = data['data'].first
+          Image.new(
+            mime_type: 'image/png',
+            model_id: model,
+            data: image_data['b64_json'],
+            usage: data['usage']
           )
         end
       end

diff --git a/spec/fixtures/ruby_with_blue.png b/spec/fixtures/ruby_with_blue.png
diff --git a/...tes/RubyLLM_Image/edit_functionality_OpenAI_/with_local_files/customizes_image_output.yml b/...tes/RubyLLM_Image/edit_functionality_OpenAI_/with_local_files/customizes_image_output.yml
diff --git a/...e/edit_functionality_OpenAI_/with_local_files/rejects_edits_with_a_non-PNG_local_file.yml b/...e/edit_functionality_OpenAI_/with_local_files/rejects_edits_with_a_non-PNG_local_file.yml
diff --git a/...it_functionality_OpenAI_/with_local_files/supports_image_edits_with_a_valid_local_PNG.yml b/...it_functionality_OpenAI_/with_local_files/supports_image_edits_with_a_valid_local_PNG.yml