Skip to content

backblaze-b2-samples/awesome-image-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Awesome Image Generation Awesome PRs Welcome License: CC0-1.0

A curated list of AI image generation APIs, SDKs, and production-ready tools. Focused on services developers can integrate today.

Last verified: March 2026

Related Lists


Contents


Text-to-Image APIs

  • OpenAI GPT Imagegpt-image-1, gpt-image-1.5, gpt-image-1-mini. Natively multimodal generation, editing, and inpainting. DALL-E 2/3 deprecated May 2026. API Reference | SDK: Python, Node
  • Black Forest Labs (FLUX Pro) – FLUX 1.1 Pro and FLUX.2 (32B params) via REST API. From the creators of FLUX and Stable Diffusion. Docs | Also on Replicate, fal.ai, Together AI
  • Stability AI – Stable Diffusion 3.5 and Stable Image via REST API. Text-to-image, image-to-image, upscaling, inpainting. Docs
  • Google Imagen (Vertex AI) – Imagen 4 via Vertex AI. Text-to-image, editing, outpainting, inpainting, customization. Docs | API Reference | SDK: Python (google-cloud-aiplatform), Node
  • Adobe Firefly API – Image generation, editing, Photoshop automation, and Lightroom operations. Part of Firefly Services platform. Docs | SDK: JS/TS (official)
  • Ideogram – Known for high-quality text rendering in images. Ideogram 3.0 supports generation, remix, edit, and character reference. OpenAI-compatible interface. Docs
  • Recraft AI – Raster and vector image generation. V4 model (Feb 2026). Background removal, inpainting, outpainting, vectorization. OpenAI-compatible interface. Docs | ComfyUI Plugin
  • Midjourney – Official API released late 2025. Enterprise/Pro plan holders only; no public self-service access. Docs
  • Amazon Titan Image Generator – Text-to-image via AWS Bedrock. Image conditioning, color palette guidance, background removal, and variations. Docs | SDK: Python (boto3), Java, PHP
  • Leonardo AI – Text-to-image, image-to-image, and image-to-video. Webhooks, LoRA models, and "Get API Code" export from web UI. Docs | SDK: TypeScript, Python
  • fal.ai – Serverless inference hosting 1000+ image models. Fastest diffusion inference engine. Hosts FLUX, SD, and more. SOC 2 compliant. Docs | SDK: Python, JS
  • PixelAPI – Pay-per-use AI image API running SDXL, FLUX Schnell, and FLUX Dev on owned GPUs. Background removal, 4x upscaling, text-to-image, and image-to-image at $0.003-$0.005/image. Docs | SDK: pip install pixelapi / npm install pixelapi

Open Source Models

FLUX Family (Black Forest Labs)

  • FLUX.1 [schnell] – 12B param rectified flow transformer. 1-4 step generation. Fully open for commercial use. Apache 2.0. GitHub
  • FLUX.1 [dev] – 12B param guidance-distilled model. High quality, competitive with closed-source. Non-commercial license.
  • FLUX.2 [dev] – 32B param model with generation, editing, and multi-reference combining.

Stable Diffusion Family (Stability AI)

  • Stable Diffusion 1.5 – 860M UNet, runs on consumer GPUs. Foundation for massive community ecosystem of LoRAs, fine-tunes, and extensions.
  • Stable Diffusion XL (SDXL) – Native 1024x1024. Improved text-in-image and limb generation. Base + refiner pipeline.
  • Stable Diffusion 3.5 Large – MMDiT architecture with three text encoders (including T5-XXL). Highest-quality Stability open model. Turbo
  • SDXL-Turbo – Adversarial distillation of SDXL enabling single-step generation.

Other Open Models

  • PixArt-Alpha / PixArt-Sigma – DiT-based T2I at 10.8% of SD1.5 training cost. Near-commercial quality. HuggingFace
  • Kandinsky 3 – Open-source T2I from AI Forever. 2x larger U-Net and 10x larger text encoder vs v2.x. HuggingFace
  • DeepFloyd IF – Cascaded pixel-space diffusion (64px → 256px → 1024px). Strong text rendering. Zero-Shot FID 6.66 on COCO.
  • Playground v2.5 – Aesthetic-focused model fine-tuned on SDXL architecture.
  • LCM / LCM-LoRA – Latent Consistency Models enabling 2-4 step generation. LCM-LoRA is a lightweight ~100MB adapter for any SDXL model. Diffusers Docs

Open Source Frameworks and UIs

  • ComfyUI – Node-based graph UI and backend for diffusion models. Highly customizable, API-accessible. Supports SD, SDXL, Flux, and modern models. GPL-3.0. Docs
  • AUTOMATIC1111 WebUI – Most widely used Gradio-based SD web UI. 161k+ stars. Extensive extension ecosystem. AGPL-3.0. Wiki
  • Fooocus – Midjourney-inspired SDXL UI. Prompt-only workflow, no manual parameter tweaking. GPL-3.0.
  • InvokeAI – Creative engine for SD models targeting professionals. Industry-leading WebUI. Apache 2.0. Website
  • Forge – Fork of AUTOMATIC1111 with improved GPU memory management and performance. Compatible with A1111 extensions. AGPL-3.0.

Image Editing and Enhancement

  • ControlNet – Precise structural control for diffusion models via edge maps, depth, pose, normals. Available for SD1.5, SDXL, and Flux. A1111 Extension
  • IP-Adapter – Lightweight adapter (~100MB) for image-based prompting. New cross-attention layers for image feature conditioning. Diffusers Docs
  • Real-ESRGAN – Image and video upscaler, up to 8x. Handles real-world blind super-resolution with noise/artifact removal. BSD 3-Clause. Replicate
  • GFPGAN – Face restoration from Tencent ARC. Restores facial details from degraded images. Often paired with Real-ESRGAN.

SDKs and Developer Tooling

  • HuggingFace Diffusers – The canonical PyTorch library for diffusion models. SD 1.5, SDXL, SD3, Flux, ControlNet, IP-Adapter, and more. Docs | pip install diffusers
  • Replicate SDK – Python/JS client for 50,000+ hosted ML models. Pay-per-second, no GPU management. Docs | pip install replicate / npm install replicate
  • OpenAI SDK – Official SDK for GPT Image generation and editing. client.images.generate() and client.images.edit(). pip install openai / npm install openai
  • fal.ai SDK – Python and JS SDKs for serverless inference. Also a Vercel AI SDK provider. Docs | pip install fal-client / npm install @fal-ai/client
  • Gradio – Python library for building interactive ML demos and web UIs. Foundation for AUTOMATIC1111, Fooocus, and HuggingFace Spaces. Includes gradio-client for programmatic access. GitHub | pip install gradio

Infrastructure and Deployment

GPU Cloud Providers

  • RunPod – GPU pods and serverless endpoints. 48% of serverless cold starts under 200ms. Docs
  • Modal – Serverless Python GPU cloud. Sub-second cold starts. Docs | pip install modal
  • Lambda Labs – On-demand A100 and H100 GPUs. Competitive pricing (~$1.10/hr A100 80GB). Docs
  • Replicate – Serverless model hosting for open-source image models. Docs
  • fal.ai – Fastest diffusion inference engine. 1000+ hosted models. Docs
  • Together AI – Inference API for 200+ open models. Docs

Image Storage and Delivery

  • Cloudinary – Enterprise image/video CDN with AI-powered transformations. Docs | SDK: Python, Node, Ruby, PHP, Java, .NET
  • Imgix – Real-time image processing CDN. URL-parameter-based transforms. Connects to existing S3/GCS storage. Docs
  • Cloudflare Images – Image CDN on Cloudflare's global network. Pre-defined variants for transformations.
  • Backblaze B2 – S3-compatible object storage at ~$0.006/GB/month. Free egress via Cloudflare. Docs

Evaluation and Observability

  • pytorch-fid – PyTorch FID (Fréchet Inception Distance) implementation. Measures distribution similarity between real and generated images. pip install pytorch-fid
  • torch-fidelity – High-fidelity ISC, FID, KID, and PRC metrics. Supports InceptionV3, CLIP, DINOv2, VGG16 feature extractors. Docs | pip install torch-fidelity
  • ImageReward – First general-purpose human preference reward model for T2I (NeurIPS 2023). Trained on 137k expert comparison pairs. Paper
  • IQA-PyTorch – Comprehensive image quality toolbox. PSNR, SSIM, LPIPS, FID, NIQE, MUSIQ, TOPIQ, NIMA, BRISQUE, and more.
  • CLIP Score – Measures semantic alignment between text prompts and generated images using CLIP embeddings. Available via torchmetrics.multimodal.CLIPScore.

Templates and Example Projects

Learning Resources


Contributing

Contributions welcome! Please read the contribution guidelines first. PRs for new tools, corrections, and updates are appreciated.

License

CC0

About

A curated list of AI image generation APIs, SDKs, and tools including text-to-image, image editing, diffusion models, generative art systems, and multimodal AI platforms. Covers commercial services, open source models with APIs, and scalable infrastructure for developers building visual applications.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors