Instance VS Class Token with filewords ? #667

andykaufseo · 2023-01-02T18:49:56Z

andykaufseo
Jan 2, 2023

From the readme :

Instance Token The unique identifier for your subject. (sks, xyz). Leave blank for fine-tuning.

I'm not training a person (which is pretty straight forward) - i want to train a new concept (like a bear with horns) or an action (like running / dancing / boxing).

Until today i knew that dreambooth required an instance token and class token.
But what does "Leave blank for fine tuning means" ?

I mean i'm using filewords already - so if i leave the instance token blank, do i have to enter something in the class token or leave that blank as well? Or maybe use filewords in there?
Can this DB implementation now be used to properly fine tune SD? Is this what this means?

For the prompts i already use [filewords], this is pretty straight forward, but for the instance token and class token things are a bit unclear.

If @d8ahazard or anyone else can help, i'm sure everyone would benefit from this knowledge.

d8ahazard · 2023-01-02T19:02:20Z

d8ahazard
Jan 2, 2023
Maintainer

So, it works like this...

If you, say, have 50 images of a boxer in various poses, and want to teach the robot how to make more pictures of boxes, I'd probably recommend you use some kind of tool to caption all of your photos, and ensure "boxer" is in each prompt. Then, you just use [filewords] for your instance/sample prompt, and start training.

That's for "classic fine-tuning", where you're just training on pictures with captions.

"Dreambooth" training is more specific to a unique subject, like a person or a specific pet. In this example, say you have 50 images of a person. I would still recommend to caption all of your images, and again, use [filewords], but this time for the class/instance/sample prompts.

What makes this different is we now add in the class/instance tokens. You'd want to ensure that all of your captions have man or woman or person in them, as long as it's the same throughout.
Now, we set our instance token to a unique word that the model doesn't know already, like "xabyzz". Doesn't really matter what the token is, as long as you remember it and it's not in the model already. The class token is then "person" or "woman" or "man" or whatever is in all of your prompts.

With me so far?

What this does on the back end is makes it so we can prepend "xabyzz" to the prompts when training on the subject - "a photo of xabyzz person on a sailboat", and use the clean prompt "a photo of a person on a sailboat" when generating class images.

Also, if your prompts already had the subject in them "a photo of xabyzz on a sailboat", it will automagically swap xabyzz with "person" when generating class prompts, and inject "person" after xabyzz for instance prompts.

2 replies

andykaufseo Jan 2, 2023
Author

_If you, say, have 50 images of a boxer in various poses, and want to teach the robot how to make more pictures of boxes, I'd probably recommend you use some kind of tool to caption all of your photos, and ensure "boxer" is in each prompt. Then, you just use [filewords] for your instance/sample prompt, and start training.

That's for "classic fine-tuning", where you're just training on pictures with captions._

So you're saying with your DB extension we can do fine tuning? Or were you talking about fine tuning in general (using other scripts / methods)?

I already have all my images captioned, so if i wanted to fine, i would leave instance and class token empty, and just add [filewords] to the prompt fields?

jsbach-jung Jan 7, 2023

Hope this helps, I will share what works for me. First, I currently use the latest Dec. 11th release for training because I’m not sure what happened after that but a lot changed sometime after, and my training process no longer really worked.

First, I picked a single token from: https://huggingface.co/runwayml/stable-diffusion-v1-5/raw/main/tokenizer/vocab.json
So let’s suppose my wife’s name is Natalie. I pick “natl” as the token because it seems to be pictures of random flags with no particular association. You could also use “cec” if your friend is named Cecil, which appears to generate random landscapes. Pick a token you find seems to have no particular meaning attached to it. Verifyusing the token as your prompt in your base model.

Second, pick your base model. I suggest just using the v1.5 fine tune, the one that’s nearly 8GB.

Go to the “Create Model” tab, give it a name, and pick the base model from the list. Check “Extract EMA”, unless you want to further fine tune the result of your model.

Create the model. There shouldn’t be issues with the v1.5 fine tune, but other merged models can have some gaps, especially if they were at one point a safetensors file. If there are errors here, you’ll have to find a way to resolve.

Now your model’s name should be in the dropdown box on the top left. First go to Concept/Concept 1 tab.

Dataset directory:
C:\Users\username\db-pics\my-captioned-pics

Instance token:
natl
Class token:
woman

Instance prompt and Class prompt I just put [filewords]

At this point, I click the “Training wizard (Person)” which then reads the # of your images and puts the “Total Class/Reg images”

Then I move to parameters. This will be highly dependent on your hardware; I use a learning rate of 0.0000015, and for the rest of the parameters, it’s just a matter of configuring something that will work for your video card. There’s many discussions here that should help you find something that works.

Earlier, for the “woman” as the class token, you could use “girl” or “person” or another term which will change the way your model reacts to future prompts. I studied this helpful Reddit post as a data point, and am testing the actual outputs of the different kinds of class tokens (e.g. person, woman, man, human, elder, girl, boy, adolescent) and how it affects the outputs in various ways.

That should be enough to get your started; good luck. I may just copy and paste this into a Reddit post soon in case it helps others.

worthy7 · 2024-03-17T16:20:22Z

worthy7
Mar 17, 2024

Still a bit confused about this. I want to try and make a model that will mimic a specific character/person. @d8ahazard

Let's say we have a Dog called Dandy. Is this correct? I do not want/use prompts for each picture.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instance VS Class Token with filewords ? #667

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Instance VS Class Token with filewords ? #667

andykaufseo Jan 2, 2023

Replies: 2 comments · 2 replies

d8ahazard Jan 2, 2023 Maintainer

andykaufseo Jan 2, 2023 Author

jsbach-jung Jan 7, 2023

worthy7 Mar 17, 2024

andykaufseo
Jan 2, 2023

Replies: 2 comments 2 replies

d8ahazard
Jan 2, 2023
Maintainer

andykaufseo Jan 2, 2023
Author

worthy7
Mar 17, 2024