Skip to content

WD14 Captioner Plugin #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
64 changes: 64 additions & 0 deletions docs/generate/wd14_captioner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
sidebar_position: 8
---

# Auto-Caption Images with WD14 Tagger (`wd14_captioner`)

This plugin uses the WD14 tagger (from the kohya-ss/sd-scripts) to automatically generate Danbooru-style tags for image datasets. It is ideal for preparing high-quality captions for datasets used in fine-tuning Stable Diffusion or similar models.

## Step 1: Prepare Your Image Dataset

Upload a dataset containing image files. The dataset must include an image column (default: `"image"`). You can configure the name of this column via the **Image Field** parameter.

> The model supports `.jpg`, `.jpeg`, `.png`, and `.webp` formats.

## Step 2: Configure Plugin Parameters

Use the parameters panel to control the tag generation behavior:

| Parameter | Description |
|----------|-------------|
| `Image Field` | Dataset column that contains the image files |
| `Tag Confidence Threshold` | Minimum confidence score for a tag to be included |
| `General Threshold` | Optional threshold specifically for general (non-character) tags |
| `Character Threshold` | Optional threshold specifically for character tags |
| `ONNX Model Variant` | Choose between ConvNeXt or ViT variants of WD14 |
| `Batch Size` | Number of images to process at once |
| `Image Resize` | Resize shorter side of image before inference |
| `Caption Separator` | Character(s) used to join multiple tags |
| `Max Dataloader Workers` | Max number of workers to load images during tagging |

## Step 3: Start the Job

Once your dataset is uploaded and parameters are configured, click the `Queue` button to start captioning. You can monitor job progress in the `Executions` tab.

When executed, the plugin will:

- Load your image dataset
- Run the selected WD14 model on each image
- Generate tags/captions based on your thresholds
- Save the results as a new dataset with two columns:
- `image` (original file path)
- `caption` (generated tags)

## Step 4: View the Output

After completion, you can view the new dataset inside the `Datasets` tab under `Generated Datasets`. The resulting dataset will contain the original images and a new column with generated captions.

You can also edit the captions and create a new dataset for downstream tasks like training, search, or labeling.

## Output Example

| image | caption |
|-------|---------|
| `pokemon_1.png` | `solo, simple_background, white_background, full_body, black_eyes, pokemon_(creature), no_humans, animal_focus` |
| `pokemon_2.jpg` | `solo, smile, open_mouth, simple_background, red_eyes, white_background, standing, full_body, pokemon_(creature), no_humans, fangs, bright_pupils, claws, white_pupils, bulbasaur` |

## Model Variants

- `wd-v1-4-convnext-tagger-v2.onnx`: More accurate, but larger
- `wd-v1-4-vit-tagger-v2.onnx`: Lightweight alternative

These models will be automatically downloaded and cached if not already present.

<img src={require('./gifs/wd14_captioner/wd14_captioner.gif').default} alt="WD14 Captioner Plugin in Action" width="500" />